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5 T^TATTO APPLICATIONS 

This application is a Continuation-Ia-Pait of U.S. application Serial No, 

h entitled GENETIC SEQUENCES AND PROTEINS RELATED TO 

ALZHEIMER'S DISEASE (Inventors: Peter H. St. George-Hyslop, Johanna N. 
Rommens and Paul E. Fraser), filed on June 28, 1995, which was a Continuation- 

10 in-Part of U.S. Application Serial No. 08/431,048, filed April 28, 1995, 



FTETD OF THE INVENTION 

The present invention relates generally to the field of neurological and 
physiological dysfunctions associated with Alzheimer's Disease. More particularly, 

15 the invention is concerned with the identification, isolation and cloning of the gene 
which when mutated is associated with Alzheimer's Disease as well as its transcript, 
gene products and associated sequence information and neighbouring genes. The 
present invention also relates to methods of diagnosing for and detection of carriers 
of the gene, Alzheimer's Disease diagnosis, gene therapy using recombinant 

20 technologies and therapy using the information derived from the DNA, protein, and 
the metabolic function of the protein. 



^ncannuND op the invention 

In order to facilitate reference to various journal articles, a listing of the 
25 articles is provided at the end of this specification. 

Alzheimer's Disease (AD) is a degenerative disorder of the human central 
nervous system characterized by progressive memory impairment and cognitive and 
intellectual decline during mid to late adult life (Katzroan, 1986). The disease is 
accompanied by a constellation of neuropathology features principal amongst which 
30 are the presence of extracellular amyloid or senile plaques and the neurofibrillary 
degeneration of neurons. The etiology of this disease is complex, although in some 
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families it appears to be inherited as an autosomal dominant trait. However, even 
amongst these inherited forms of AD, there are at least three different genes which 
confer inherited susceptibility to this disease (St George-Hyslop et aL, 1990). The e4 
(Cysll2Arg) allelic polymorphism of the Apolipoprotein E (ApoE) gene has been 

5 associated with AD in a significant proportion of cases with onset late in life 
(Saunders et aL, 1993; Strittmatter et aL, 1993). Similarly, a very small proportion 
of familial cases with onset before age 65 years have been associated with mutations 
in the 0-amyloid precursor protein (APF) gene (Chartier-Harlin et aL, 1991; Goate 
et at, 1991; MurreU et aL, 1991; Karlinsky et aL, 1992; MuUan et aL, 1992), A 

10 third locus (AD3) associated with a larger proportion of cases with early onset AD 
has recently been mapped to chromosome 14q24,3 (Schellenberg et aL, 1992; St 
George-Hyslop et aL, 1992; Van Broeckhoven et aL, 1992), 

Although chromosome 14q carries several genes which could be regarded as 
candidate genes for the site of mutations associated with ADS (e.g. cFOS , alpha-1- 

15 antichymotrypsin, and cathepsin G), most of these candidate genes have been 
excluded on the basis of their physical location outside the AD3 region and/or the 
absence of mutations in their respective open reading frames (Schellenberg, GD et 
aL, 1992; Van Broeckhoven, C et aL, 1992; Rogaev et aL, 1993; Wong et aL, 
1993). 

20 There have been several developments and commercial directions in respect 

of treatment of Alzheimer's Disease and diagnosis thereof. Published PCT 
application WO 94 23049 describes transection of high molecular weight YAC DNA 
into specific mouse ceils. This method is used to analyze large gene complexes, for 
example the transgenic mice may have increased amyloid precursor protein gene 

25 dosage, which mimics the trisomic condition that prevails in Downs Syndrome and 
the generation of animal modds with ^-amyloidosis prevalent in individuals with 
Alzheimer's Disease. Published international application WO 94 00569 describes 
transgenic non-human animals harbouring large trans genes such as the trans gene 
comprising a human amyloid precursor protein gene. Such animal models can 

30 provide useful models of human genetic diseases such as Alzheimer's Disease. 




Canadian Patent application 2096911 describes a nucleic acid coding for 
amyloid precursor protein-cleaving protease, which is associated with Alzheimer's 
Disease and Down's syndrome. The genetic information may be used to diagnose 
Alzheimer's disease. The genetic information was isolated from chromosome 19, 

5 Canadian patent application 2071105, describes detection and treatment of inherited 
or acquired Alzheimer's disease by the use of YAC nucleotide sequences. The YACs 
are identified by the numbers 23CB10, 28CA12 and 26FF3. 

U.S. Patent 5297562, describes detection of Alzheimer's Disease having two 
or more copies of chromosome 21. Treatment involves methods for reducing the 

10 proliferation of chromosome 21 trisomy* Canadian Patent application 2054302, 
describes monoclonal antibodies which recognize human brain cell nucleus protein 
encoded by chromosome 21 and are used to detect changes or expression due to 
Alzheimer's Disease or Down's Syndrome. The monoclonal antibody is specific to 
a protein encoded by human chromosome 21 and is linked to large pyramidal cells 

15 of human brain tissue. 

By extensive effort and a unique approach to investigating the AD3 region of 
chromosome 14q, the Alzheimer's related membrane protein (ARMP) gene has been 
isolated, cloned and sequenced from within the AD3 region on chromosome 14q24,3, 
In addition, direct sequencing of RT-PCR products spanning this 3.0 kb cDNA 

20 transcript isolated from affected members of at least eight large pedigrees linked to 
chromosome 14, has led to the discovery of missense mutations in each of these 
different pedigrees. These mutations are absent in normal chromosomes. It has now 
been established (hat the AKMP gene is causative of familial Alzheimer's Disease 
type AD3. In realizing this link, it is understood that mutations in this gene can be 

25 associated with other cognitive, intellectual, or psychological diseases such as cerebral 
hemorrhage, schizophrenia, depression, mental retardation and epilepsy. These 
phenotypes are present in these AD families and these phenotypes have been seen in 
mutations of the APP protein gene. The Amyloid Precursor Protein (APP) gene is 
also associated with inherited Alzheimer's Disease. The identification of both normal 

30 and mutant forms of the ARMP gene and gene products has allowed for the 
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development of screening and diagnostic tests for ARMP utilizing nucleic acid probes 
and antibodies to the gene product. Through interaction with the defective gene 
product and the pathway in which this gene product is involved, gene therapy, 
manipulation and delivery are now made possible. 

5 

SUMMARY OF THE INVENTION 

Various aspects of the invention are summarized as follows* In accordance 
with a first aspect of the invention, a purified mammalian polynucleotide is provided 
which codes for Alzheimer's related membrane protein (ARMP). The polynucleotide 

10 has a sequence which is the functional equivalent of the DNA sequence of ATCC 

deposit , deposited April 28, 1995. The mammalian polynucleotide may be 

in the form of DNA, genomic DNA, cDNA, mRNA and various fragments and 
portions of the gene sequence encoding ARMP. The mammalian DNA is conserved 
in many species, including humans and rodents, example mice. The mouse sequence 

15 encoding ARMP has greater than 95% homology with the human sequence encoding 
the same protein* 

Purified human nucleotide sequences which encode mutant ARMP have 
mutations at nucleotide position i) 685, A-*C ii) 737, A-*G iii) 986, OA, iv) 
1105, OG, v) 1478, G-*A, vi) 1027, C-*T, vii) 1102, C-*T and viii) 1422, OG 

20 of Sequence ID No: 1 as well as in the cDNA sequence of a further human clone of 
a sequence identified by ID NO: 1^2. 

The nucleotide sequences encoding ARMP have an alternative splice form in 
the genes open reading frame. The human cDNA sequence which codes for ARMP 
has sequence ID No. 1 as well as sequence ID NO: J32 as sequenced in a another 

25 human clone. The mouse sequence which encodes ARMP has sequence ID No, 3, as 
well as SEQ ED NO: 134 derived from a further done containing the entire coding 
region. Various DNA and RNA probes and primers may be made from appropriate 
polynucleotide lengths selected from the sequences. Portions of the sequence also 
encode antigenic determinants of the ARMP, 

30 Suitable expression vectors comprising the nucleotide sequences are provided 
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along with suitable host cells transfected with such expression vectors. 

In accordance with another aspect of the invention, purified mammalian 
Alzheimer's related membrane protein is provided. The purified protein has an amino 
acid sequence encoded by polynucleotide sequence as identified above which for the 
5 human is sequence ID NO:2 and SEQ ID NO: 133 (derived from another clone). 
The mouse amino acid sequence is defined by sequence ID No. 2 and sequence ID 
No. 4, the later being translated from another clone containing the entire coding 

r 

region. The purified protein may have substitution mutations selected from the group 
consisting of positions identified in Sequence ID No: 2 and Sequence ID NO:133. 
10 i) M 146L 

u) H 163R 

iii) A 246E 

iv) L 286V 

v) C 410 Y 
15 vi) A 260 V 

vii) A 285 V 

viii) L 392 V 

In accordance with another aspect of the invention, are polyclonal antibodies 
raised to specific predicted sequences of the ARMP protein. Polypeptides of at least 
20 six amino acid residues are provided. The polypeptides of six or greater amino acid 
residues may define antigenic epitopes of the ARMP. Monoclonal antibodies having 
suitably specific binding affinity for the antigenic regions of the ARMP are prepared 
by use of corresponding hybridoma cell lines. In addition, other polyclonal antibodies 
may be prepared by inoculation of animals with suitable peptides or holqprotein which 
25 add suitable specific binding affinities for antigenic regions of the ARMP, 

In accordance with another aspect of the invention* an isolated DNA molecule 
is provided which codes for E5-1 protein. 

In accordance with another aspect of the invention, purified E5-1 protein is 
provided, having amino acid Sequence ID No: 137. 
30 In accordance with another aspect of the invention a bioassay is provided for 

5 
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determining if a subject has a normal or mutant ARMP, where the bioassay comprises 
providing a biological sample from the subject 

conducting a biological assay on the sample to detect a normal or mutant gene 
sequence coding for ARMP, a normal or mutant ARMP amino acid sequence, or a 
5 normal or defective protein function. 

In accordance with another aspect of the invention, a process is provided for 
producing ARMP comprising culturing one of the above described transfected host 
cells under suitable conditions, to produce the ARMP by expressing the DNA 
sequence. Alternatively, ARMP may be isolated from mammalian cells in which the 
10 ARMP is normally expressed* 

In accordance with another aspect of the invention, is a therapeutic 
composition comprising ARMP and a pharmaceutical^ acceptable carrier. 

In accordance with another aspect of the invention, a recombinant vector for 
transforming a mammalian tissue cell to express therapeutically effective amounts of 
15 ARMP in the cells is provided. The vector is normally delivered to the cells by a 
suitable vehicle. Suitable vehicles include vaccinia virus, adenovirus, adeno 
associated virus, retrovirus, liposome transport, neuraltropic viruses, Herpes simplex 
virus and other vector systems. 

In accordance with another aspect of the invention, a method of treating a 
20 patient deficient in normal ARMP comprising administering to the patient a 
therapeutically effective amount of the protein targeted at a variety of patient cells 
which normally express ARMP, The extent of administration of normal ARMP being 
sufficient to override any effect the presence of the mutant ARMP may have on the 
patient As an alternative to protein, suitable ligands and therapeutic agents such as 
25 small molecules and other drug agents may be suitable for drug therapy designed to 
replace the protein and defective ARMP, displace mutant ARMP, or to suppress its 
formation. 

In accordance with another aspect of the invention an immuno therapy for 
treating a patient having Alzheimer's Disease comprises treating the patient with 
30 antibodies specific to the mutant ARMP to reduce biological levels or activity of the 
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mutant ARMP in the patient* To facilitate such amino acid therapy, a vaccine 
composition may be provided for evoking an immune response in a patient of 
Alzheimer's Disease where the composition comprises a mutant ARMP and a 
pharmaceutical^ acceptable cairier with or without a suitable excipient. The 
5 antibodies developed specific to the mutant ARMP could be used to target 
appropriately encapsulated drugs/molecules, specific cellular/tissue sites. Therapies 
utilising specific ligands which bind to normal or wild type ARMP of either mutant 
or wild type and which augments normal function of ARMP in membranes and/or 
cells or inhibits the deleterious effect of the mutant protein are also made possible, 

10 In accordance with another aspect of the invention, a transgenic animal model 

for Alzheimer's Disease which has the mammalian polynucleotide sequence with at 
least one mutation which when expressed results in mutant ARMP in the animal cells 
and thereby manifests a phenotype. For example, the human Prion gene when over- 
expressed in rodent peripheral nervous system and muscle cells causes a quite 

IS different response in the animal than the human. The animal may be a rodent and 
is preferably a mouse, but may also be other animals including rat, pig, Irosophila 
melanogaster, C, elegans (nematode), all of which are used for transgenic models. 
Yeast cells can also be used in which the ARMP Sequence is expressed from an 
artificial vector, 

20 In accordance with another aspect of the invention a transgenic mouse model 

for Alzheimer's Disease has the mouse gene encoding ARMP human or murine 
homologues mutated to manifest the symptoms. The transgenic mouse may exhibit 
symptoms of cognitive memory or behavioural disturbances. In addition or 
alternatively, the symptoms may appear as another cellular tissue disorders such as 

25 in mouse liver, kidney spleen or bone marrow and other organs in which the ARMP 
gene product is normally expressed. 

In accordance with another aspect of the invention, the protein can be used as 
a starting point for rationale drug design to provide ligands, therapeutic drugs or other 
types of small chemical molecules, 

30 
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BRIEF DESCRIPTION OP THE DRAWINGS 

Various aspects of the invention are described hereinafter with respect to the 
drawings wherein: . 

Figure la. Genomic physical and transcriptional map of the AD3 region of 
chromosome 14. Genetic map inter-marker genetic distances averaged for male and 
female meiosis are indicated in centiMorgans, 

Figure lb. Is the constructed physical contig map of overlapping genomic DNA 
fragments cloned into YACs spanning a FAD locus on chromosome 14q, 

Figure 1c. Regions of interest within the constructed physical contig map. 

Figure Id. Transcriptional map illustrating physical locations of the 19 independent 
longer cDNA clones. 

Figure 2. Automated fluorescent chromatograms representing the change in 
nucleic acids which direct (by the codon) the amino acid sequence of the gene. 

(a) Met 146 Leu 

(b) His 163 Arg 

(c) Ala246Glu 

(d) Leu 286 Val 

(e) Cys 410 Tyr 

Figure 3(a), Restriction fragments of M 146 L mutation using BsphI restriction 
enzyme in AD patients. Absence of a restriction site indicates a mutant allele. 

Figure 3(b). Presence of the His 163 Arg mutation detected by Nlam restriction 
digestion. Absence of a restriction indicates a mutant allele. 

8 
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Figure 3(c), Presence of the Ala 246 Glu mutation in AD 

patients using Ddel restriction enzyme. Presence of mutant allele leads to restriction. 

Figure 3(d). Presence of Cys 410 Tyr mutation in AD patients as assayed using 
5 allelic specific oligonucleotides. 

Figure 3(e). Presence of Leu286Val mutation in AD patients using PvuII restriction 
enzyme in AD patients. 

10 Figure 4. RNA blot demonstrating the expression of ARMP protein mRNA in 
different regions of the brain including amygdala, caudate, corpus callosum, 
hippocampus, hypothalamus, substantia nigra, subthalamic nucleus and thalamus. 

Figure 5. RNA blot demonstrating the expression of ARMP protein mRNA in a 
15 variety of tissues including heart, brain, placenta, lung, liver, skeletal muscle* kidney 
and pancreas. 

Figure 6a. Hydropathy plot of the putative ARMP protein. 

20 Figure 6b. A model for the structural organization of the putative ARMP protein, 
Roman numerals depict the transmembrane domains. Putative glycosylation sites are 
indicated as asterisks and most of the phosphorylation sites axe located on the same 
membrane face as the two acidic hydrophillic loops. The MAP kinase site is present 
at residue 115 and the PKC site at residue 1 14. FAD mutation sites are indicated by 

25 horizontal arrows. 

Figure 7 shows transcription of the E5-1 gene, investigated by hybridization of the 
E5-7 cDNA to Northern blots of mRNA from multiple human brain regions (Panel 
A), and several peripheral tissues (Panel Q. In brain, the E5-1 transcript is of a 
30 lower molecular weight and lesser abundance that the ARMP transcript (Panel B) 
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hybridized to the same blot using identical conditions* 
Figure 8 shows the predicted structure of the E5-1 protein. 

5 DETAILED P ES MTON OF PREFERRED EMBODIMENTS 

In order to facilitate review of the various embodiments of the invention and 
an understanding of various elements and constituents used in making the invention 
and using same, the following definition of terms used in the invention description is 
as follows: 

10 Alzheimer Related Membrane Protein gene (ARMP gene) - the chromosome 14 
gene which when mutated is associated with familial Alzheimer's Disease and/or 
other inheritable disease phenotypes (eg. cerebral hemorrhage, mental retardation, 
schizophrenia, psychosis, and depression). This definition is understood to include 
the various sequence polymorphisms that exist, wherein nucleotide substitutions in the 

15 gene sequence do not affect the essential function of the gene product, as well as 
functional equivalents of the nucleotide sequences of Sequence ID No. 1, Sequence 
ID NO:132, Sequence ID No: 3 and Sequence ID NO:134. This torn primarily 
relates to an isolated coding sequence, but can also include some or all of the flanking 
regulatory elements and/or introns* The term ARMP gene includes the gene in other 

20 species analogous to the human gene which when mutated is associated with 
Alzheimer's disease. 

Alzheimer Related Membrane Protein (ARMP) - the protein encoded by the ARMP 
gene* The preferred source of protein is the mammalian protein as isolated from 
humans or animals. Alternatively, functionally equivalent proteins may exist in 

25 plants, insects and invertebrates (such as G elegans). The protein may be produced 
by recombinant organisms, or chemically or enzymatically synthesized. This 
definition is understood to include functional variants such as the various polymorphic 
forms of the protein wherein amino acid substitutions or deletions within the amino 
acid sequence do not affect the essential functioning of the protein, or its structure. 

30 It also includes functional fragments of ASMP, 

10 
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Mutant ARMP gene - The ARMP gene containing one or more mutations which lead 
to Alzheimer** Disease and/or other inheritable disease phenotypes (eg. cerebral 
hemorrhage, mental retardation, schizophrenia, psychosis, and depression). This 
definition is understood to include the various mutations that exist, wherein nucleotide 

5 substitutions in the gene sequence affect the essential function of the gene product, 
as well as mutations of functional equivalents of the nucleotide sequences of Sequence 
ID No. 1, Sequence ID NO:132, Sequence ID No:3, and ID NO: 134 (the 
corresponding amino acid sequences), This term primarily relates to an isolated 
coding sequence, but can also include some or all of the flanking regulatory dements 

10 and/or introns. 

Mutant ARMP - a mammalian protein that is highly analogous to ARMP in terms 
of primary structure, but wherein one or more amino add deletions and/or 
substitutions result in impairment of its essential function, so that mammals, 
especially humans, whose ARMP producing cells express mutant ARMP rather than 

15 the normal ARMP, demonstrate the symptoms of Alzheimer's Disease and/or other 
relevant inheritable phenotypes (eg. cerebral hemorrhage, mental retardation, 
schizophrenia, psychosis, and depression), 

mARMP gene - mouse gene analogous to the human ARMP gene. Functional 
equivalent as used in describing gene sequences and amino acid sequences means that 

20 a recited sequence need not be identical to the definitive sequence of the Sequence ID 
Nos but need only provide a sequence which functions biologically and/or chemically 
the equivalent of the definitive sequence. Hence sequences which correspond to a 
definitive sequence may also be considered as functionally equivalent sequence. 
mARMP - mouse Alzheimer related membrane protein, analogous to the human 

25 ARMP, encoded by the mARMP gene. This definition is understood to include the 
various polymorphic forms of the protein wherein amino acid substitutions or 
deletions of the sequence does not affect the essential functioning of the protein, or 
its structure* 

Mutant mARMP - a mouse protein which is highly analogous to mARMP in terms 
30 of primary structure, but wherein one or more amino acid deletions and/or 
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substitutions result in impairment of its essential function, so that mice, whose 
mARMP producing cells express mutant mARMP rather than the normal mARMP 
demonstrate the symptoms of Alzheimer's Disease and/or other relevant inheritable 
phenotypes, or other phenotypes and behaviours as manifested in mice, 
5 ARMP carrier - a mammal in apparent good health whose chromosomes contain a 
mutant ARMP gene that may be transmitted to the offspring and who will develop 
Alzheimer's Disease in mid to late adult life. 

Mbsense mutation - A mutation of nucleic acid sequence which alters a codon to that 
of another amino acid, causing an altered translation product to be made, 

10 Pedigree - In human genetics, a diagram showing the ancestral relationships and 
transmission of genetic traits over several generations in a family. 
JE5-2 gene - the chromosome 1 gene which shows homology to the ARMP gene and 
which when mutated is associated with familial Alzheimer's disease and/or other 
inheritable disease phenotypes. This definition is understood to include the various 

IS sequence polymorphisms that exist, wherein nucleotide substitutions in the gene 
sequence do not affect the essential function of the gene product, as well as functional 
equivalents of the nucleotide Sequence ID No: 136. This term also includes the gene 
in other species analogous to the human gene described herein, 
E5-1 protein - the protein encoded by the E5-1 gene. This term includes the protein 

20 of Sequence ID No; 137 and also functional variants such as the various polymorphic 
and splice variant forms of the protein wherein amino acid substitutions or deletions 
within the amino acid sequence do not affect the essential functioning of the protein. 
The term also includes functional fragments of the protein. 
Mntant JE5-1 gene - the E5-1 gene containing one or more mutations which lead to 

25 Alzheimer's Disease. This term is understood to include the various mutations that 
exist, wherein nucleotide substitutions in the gene sequence affect the essential 
function of the gene product. 

Mutant E5-1 protein - a protein analogous to E5-1 protein but wherein one or more 
amino acid deletions and/or substitutions result in impairment of its essential function 
30 such that mammals, especially humans, whose E5-l-producing cells express mutant 



E5-1 protein demonstrate the symptoms of Alzheimer's Disease, 
linkage analysis- Analysis of co-segregation of a disease trait or disease gene with 
polymorphic genetic markers of defined chromosomal location. 
hARMP gene - human ARMP gene 
5 ORF - open reading frame, 

PCR - polymerase chain reaction, 

contig - continuous cloned regions 

YAC - yeast artificial chromosome 

RT-PCR - reverse transcription polymerase chain reaction. 

10 SSR - Simple sequence repeat polymorphism* 

The present invention is concerned with the identification and sequencing of 
the mammalian ARMP gene in order to gain insight into the cause and etiology of 
familial Alzheimer's Disease. From this information, screening methods and 
therapies for the diagnosis and treatment of the disease can be developed. The gene 

15 has been identified, cDNA isolated and cloned, and its transcripts and gene products 
identified and sequenced. During such identification of the gene, considerable 
sequence information has also been developed on intron information in the ARMP 
gene, flanking untranslated information and signal information and information 
involving neighbouring genes in the AD3 chromosome region. Direct sequencing of 

20 overlapping RT-PCR products spanning the human gene isolated from affected 
members of large pedigrees linked to chromosome 14 has led to the discovery of 
missense mutations which co-segregate with the disease. 

Although it is generally understood that Alzheimer's Disease is a neurological 
disorder, most likely in the brain, expression of ARMP has been found in varieties 

25 of human tissue such as heart, brain, placenta, lung, liver, skeletal muscle, kidney 
and pancreas. Although this gene is expressed widely, the clinically apparent 
phenotype exists in brain although it is conceivable that biochemical phenotypes may 
exist in these other tissues. As with other genetic diseases such as Huntington's 
Disease and APP - Alzheimer's, the clinical disease manifestation may reflect 

30 different biochemistries of different cell types and tissues ( which stem from genetics 
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and the protein). Such findings suggest that AD may not be solely a neurological 
disorder but may also be a systemic disorder, hence requiring alternative therapeutic 
strategies which may be targeted to other tissues or organs or generally in addition 
or separately from neuronal or brain tissues. 

5 The ASMP mutations identified have been related to Alzheimer disease 

pathology. With the identification of sequencing of the gene and the gene product, 
probes and antibodies raised to the gene product can be used in a variety of 
hybridization and immunological assays to screen for and detect the presence of either 
a normal or mutated gene or gene product, 

10 Patient therapy through removal or blocking of the mutant gene product, as 

well as supplementation with the normal gene product by amplification, by genetic 
and recombinant techniques or by immunotherapy can now be achieved. Correction 
or modification of the defective gene product by protein treatment immunotherapy 
(using antibodies to the defective protein) or knock-out of the mutated gene is now 

IS also possible* Familial Alzheimer's Disease could also be controlled by gene therapy 
in which the gene defect is corrected in situ or by the use of recombinant or other 
vehicles to deliver a DNA sequence capable of expressing the normal gene product, 
or a deliberately mutated version of the gene product whose effect counter balances 
the deleterious consequences of the disease mutation to the affected cells of the 

20 patient* 

The present invention is also concerned with the identification and sequencing 
of a second gene, theJE5-i gene on chromosome 1, which is associated with familial 
Alzheimer's Disease. 

Disease mechanism insights and therapies analogous to those described above 
25 in relation to the ARMP gene will be available as a result of the identification and 
isolation of the E5-1 gene. 

Isolating thg Hflnian ARMP Gens 

Genetic mapping of the AD3 locus. 
30 After the initial regional mapping of the AD3 gene locus to 14q24.3 near the 
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anonymous microsatellite markers D14S43 and D14S53 (Schellenberg, GD et aL, 
1992; St George-Hyslop, P ct aL, 1992; Van Broeckhoven, C et aL, 1992), twenty 
one pedigrees were used to segregate AD as a putative autosomal dominant trait (St 
George-Hyslop, P et aL, 1992) and to investigate the segregation of 18 additional 
5 genetic markers from the 14q24.3 region which had been organized into a high 
density genetic linkage map (Figure lb) (Weissenbach et aL, 1992; Gyapay et aL, 
1994). Pairwise maximum likelihood analyses previously published confirmed 
substantial cumulative evidence for linkage between FAD and all of these markers 
(Table 1). However, much of the genetic data supporting linkage to these markers 
10 were derived from six large early onset pedigrees FAD1 (Nee et al., 1983), FAD2 
(Frommelt et aL, 1991), FAD3 (Goudsmit etaL, 1981; Pollen, 1993), FAD4 (Foncin 
et al M 1985), TORL1 (Bergamini, 1991) and 603 (Pericak-Vance et aL, 1988) each 
of which provide at least one anonymous genetic marker from 14q24.3 (St, George- 
Hyslop, P. et al 1992). 

15 In order to more precisely define the location of the AD3 gene relative to the 

known locations of the genetic markers from 14q24.3, recombinational landmarks 
were sought by direct inspection of the raw haplotype data only from genotyped 
affected members of the six pedigrees showing definitive linkage to chromosome 14, 
This selective strategy in this particular instance necessarily discards data from the 

20 reconstructed genotypes of deceased affected members as well as from elderly 
asymptomatic members of the large pedigrees, and takes no account of the smaller 
pedigrees of uncertain linkage status. However, this strategy is very sound because 
it also avoids the acquisition of potentially misleading genotype data acquired either 
through errors in the reconstructed genotypes of deceased affected members arising 

25 from non-paternity or sampling errors or from the inclusion of unlinked pedigrees. 

Upon inspection of the Jiaplotype data for affected subjects, members of the 
six large pedigrees whose genotypes were directly determined revealed obligate 
recombinants at D14S48 and D14S53, and at D14S258 and D14S63. The single 
recombinant at D14S53, which depicts a telomeric boundary for the FAD region, 

30 occurred in the same AD affected subject of the FAD1 pedigree who had previously 



15 



( ) 



been found to be recombinant at several other markers located telomeric to D14S53 
including D14S48 (St George-Hyslop, P et aL, 1992). Conversely, the single 
recombinant at D14S258 f which marks a centromeric boundary of the FAD region, 
occurred in an affected member of the FAD3 pedigree who was also recombinant at 

5 several other markers centromeric to D14S258 including D14S63. Both recombinant 
subjects had unequivocal evidence of Alzheimer's disease confirmed through standard 
clinical tests for the illness in other affected members of their families, and the 
genotype of both recombinant subjects was informative and co-segregating at multiple 
loci within the interval centromeric to D14S53 and telomeric to D14S258. 

10 When the haplotype analyses were enlarged to include the reconstructed 

genotypes of deceased affected members of the six large pedigrees as well as data 
from the remaining fifteen pedigrees with probabilities for linkage of less than 0*95, 
several additional recombinants were detected at one or more marker loci within the 
interval between D14SS3 and D14S258. Thus, one additional recombinant was 

IS detected in the reconstructed genotype of a deceased affected member of each of three 
of the larger FAD pedigrees (FAD1, FAD2 and other related families), and eight 
additional recombinants were detected in affected members of five smaller FAD 
pedigrees. However, while some of these recombinants might have correctly placed 
the AD3 gene within a more defined target region, we were forced to regarded these 

20 potentially closer "internal recombinants" as unreliable not only for the reasons 
discussed earlier, but also because they provided mutually inconsistent locations for 
the AD3 gene within the D14S53-D14S258 interval. 

Construction of a physical contig spanning the AD3 region. 

25 As an initial step toward cloning the AD3 gene a contig of overlapping 

genomic DNA fragments cloned into yeast artificial chromosome vectors, phage 
artificial chromosome vectors and cosmid vectors was constructed (Figure lb). FISH 
mapping studies using cosmids derived from the YAC clones 932c7 and 964f5 
suggested that the interval most likely to carry the AD3 gene was at least five 

30 megabases in size. Because the large size of this minimal co-segregating region 
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would make positional cloning strategies intractable, additional genetic pointers were 
sought which focused the search for the AD3 gene to one or more subregions within 
the interval flanked by D14S53 and D14S258. Haplotype analyses at the markers 
between D14S53 and D14S258 failed to detect statistically significant evidence for 
5 linkage disequilibrium and/or allelic association between the FAD trait and alleles at 
any of these markers, irrespective of whether the analyses were restricted to those 
pedigrees with early onset forms of FAD, or were generalized to include all 
pedigrees. This result was not unexpected given the diverse ethnic origins of our 
pedigrees. However, when pedigrees of similar ethnic descent were collated, direct 

10 inspection of the haplotypes observed on the disease bearing chromosome segregating 
in different pedigrees of similar ethnic origin revealed two clusters of marker loci 
(Table 2). The first of these clusters located centromeric to D14S77 (D14S786, 
D14S277 and D14S268) and spanned the 0.95 Mb physical interval contained in 
YAC 78842 (depicted as region B in figure lc). The second cluster was located 

15 telomeric to D14S77 (pi4S43 , D14S273 , and D14S7S) and spanned the - 1Mb 
physical interval included within the overlapping YAC clones 964c2, 74163, 797dll 
and part of 854f5 (depicted as region A in figure lc). Identical alleles were observed 
in at least two pedigrees from the same ethnic origin (Table 2). As part the strategy, 
it was reasoned that the presence of shared alleles at one of these groups of physically 

20 clustered marker loci might reflect the co-inheritance of a small physical region 
surrounding the ARMP gene on the original founder chromosome in each ethnic 
population. Significantly, each of the shared extended haplotypes were rare in normal 
Caucasian populations and allele sharing was not observed at other groups of markers 
spanning similar genetic intervals elsewhere on chromosome 14q24.3. 

25 

Transcription mapping and preliminary analysis of candidate genes 

To isolate expressed sequences encoded within both critical intervals, a direct 
selection strategy was used involving immobilized, cloned, human genomic DNA as 
the hybridization target to recover transcribed sequences from primary complementary 
30 DNA pools derived from human brain mRNA (Rommens et al., 1993). 
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Approximately 900 putative cDNA fragments of size 100 to 600 base pairs were 
recovered from regions A and B in figure lc. These fragments were hybridized to 
Southern blots containing genomic DNAs from each of the overlapping YAC clones 
and genomic DNAs from humans and other mammals. This identified a subset of 
151 clones which showed evidence for evolutionary conservation and/or for a 
complex structure which suggested that they were derived from spliced mRNA. The 
clones within this subset were collated on the basis of physical map location, cross- 
hybridization and nucleotide sequence, and were used to screen conventional human 
brain cDNA libraries for longer cDNAs. At least 19 independent cDNA clones over 
1 kb in length were isolated and then aligned into a partial transcription map of the 
AD3 region (Figure Id). Only three of these transcripts corresponded to known 
characterized genes (cFOS, dihydrolipoamide succinyl transferase and latent 
transforming growth factor binding protein 2). 

Recovery of Potential Candidate Genes 

Each of the open reading frame portions of the candidate genes were recovered 
by RT-PCR from mRNA isolated from post-mortem brain tissue of normal control 
subjects and from either post-mortem brain tissue or cultured fibroblast cell lines of 
affected members of six pedigrees definitively linked to chromosome R The RT- 
PCR products were then screened for mutations using chemical cleavage and 
restriction endonuclease fingerprinting single-strand sequence conformational 
polymorphism methods (Saleebaand Cotton, 1993; Liu andSommer, 1995), and by 
direct nucleotide sequencing. With one exception, all of the genes examined, 
although of interest, were not unique to affected subjects, and did not co-segregate 
with the disease. The single exception was the candidate gene represented by clone 
S182 which contained a series of nucleotide changes not observed in normal subjects, 
but which altered the predicted amino acid sequence in affected subjects. Although 
nucleotide sequence differences were also observed in some of the other genes, most 
were in the 3' untranslated regions and none were unique to AD-affected subjects. 

The remaining sequences, a subset of which are mapped in Figure lb together 
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with additional putative transcriptional sequences not identified in Figure lc, are 
identified in the sequence listings as 14 through 43. The Sequence ID Nos: 14 to 43 
represent neighbouring genes or fragments of neighbouring genes adjacent the 
hARMP gene or possibly additional coding fragments arising from alternative splicing 
5 of the hARMP. Sequence ID Nos: 44-125, and 149-159 represent neighboring 
genomic fragments containing both exon and intron information. Such sequences are 
useful for creating primers, for creating diagnostic tests, creating altered regulatory 
sequences and use of adjacent genomic sequences to create better animal models. 

10 Characterization of the hARMP gene 

Hybridization of the S182 clone to northern blots identified a transcript 
expressed widely in many areas of brain and peripheral tissues as a major 3.0 Id) 
transcript and a minor transcript of 7.0 kb (Figures 4 and 5). Although the identity 
of the ~ 7,0 kb transcript is unclear, two observations suggest that the - 3.0 kb 

15 transcript represents an active product of the gene. Hybridization of the S182 clone 
to northern blots containing mRNA from a variety of murine tissues, including brain, 
identifies only a single transcript identical in size to the ~ 3.0 kb human transcript. 
All of the longer cDNA clones recovered to date (2.6-2.8 kb), which include both 5' 
and 3' UTRs and which account for the ~ 3.0 kb band on the northern blot, have 

20 mapped exclusively to the same physical region of chromosome 14. From these 
experiments the - 7.0 kb transcript could represent either a rare alternately spliced 
or polyadenylated isoform of the - 3.0 kb transcript or could represent another gene 
with homology to SI82. 

The nucleotide sequence of the major transcript was determined from the 

25 consensus of eleven independent longer cDNA clones and from 3 independent clones 
recovered by standard 5' rapid amplification of cDNA ends and bears no significant 
homology to other human genes. The cDNA of the sequenced transcript is provided 
in Sequence ID No: 1 and the predicted amino acid sequence is provided in Sequence 
ID No: 2. The cDNA sequence of another sequenced human clone is also provided 

30 as Sequence ID NO: 132 and its predicted amino acid sequence is provided in SEQ 
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ID NO:133. 

Analysis of the 5' end of multiple cDNA clones and RT-PCR products as well 
as corresponding genomic clones indicates that the 5* UTR is contained within at least 
two exons and that transcription either begins from two different start sites and/or that 
5 one of the early 5* untranslated exons is alternatively spliced (Table 6). The longest 
predicted open reading frame contains 467 amino acids with a small alternatively 
spliced exoti of 4 amino acids at 25 codons from the putative start codon (Table 3), 
This putative start codon is the first in phase ATG located 63 bp downstream of a 
TGA stop codon and lacks a classical Kozak consensus sequences around the first two 
10 in-phase ATG sequences (Rogaer et al t in preparation), like other genes lacking 
classical 'strong' start codons, the putative 5 1 UTR of the human transcripts are rich 
inGC. 

Comparison of the nucleic acid and predicted amino acid sequences with 
available databases using the BLAST alignment paradigms revealed modest amino 

15 acid similarity with the G elegans sperm integral membrane protein SPE-4 (p « 1.5e* 
* 24-37% identity over three groups of at least fifty residues) and weaker similarity 
to portions of several other membrane spanning proteins including mammalian 
chromogranin A and alpha subunit of mammalian voltage dependent calcium channels 
(Altschul et al„ 1990). This clearly established that they are not the same gene. The 

20 amino-acid sequence similarities across putative transmembrane domains may 
occasionally yield alignment that simply arises from the limited number of 
hydrophobic amino acids, but there is also extended sequence alignment between SI 82 
protein and SPE-4 at several hydrophillic domains* Both the putative S182 protein 
and SPE-4 are predicted to be of comparable size (467 and 465 residues, respectively) 

25 and to contain at least seven transmembrane domains with a large acidic domain 
preceding the final predicted transmembrane domain. The S182 protein does have 
a longer predicted hydrophillic region at the N terminus. 

Further investigation of the hARMP has revealed a host of sequence fragments 
which form the hARMP gene and include intron sequence information, 5* end 

30 untranslated sequence information' and 3' end untranslated sequence information 
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(Tabic 6). Such sequence fragments are identified in Sequence ID Nos, 6 to 13. 

Mutations in the S182 transcript 

Direct sequencing of overlapping RT-PCR products spanning the 3,0 kb S182 
5 transcript isolated from affected members of the six large pedigrees linked to 
chromosome 14 led to the discovery of eight missense mutations in each of the six 
pedigrees (Table 7, Figure 2). Each of these mutations co-segregated with the disease 
in the respective pedigrees [Figures 3(a)(b)(c)(d)(e)], and were absent from 142 
unrelated neurologically normal subjects drawn from the same ethnic origins as the 

10 FAD pedigrees (284 unrelated chromosomes). 

The location of the gene within the physical interval segregating with AD3 
trait, the presence of eight different missense mutations which co-segregate with the 
disease trait in six pedigrees definitively linked to chromosome 14, and the absence 
of these mutations in 284 independent normal chromosomes cumulatively confirms 

15 that the hARMP gene is the AD3 locus. Further biologic support for this hypothesis 
arises both from the fact that the residues mutated in FAD kindreds are conserved in 
evolution (Table 3) and occur in domains of the protein which are also highly 
conserved, and from the fact that the S182 gene product is expressed at high levels 
in most regions of the brain including the most severely affected with AD. 

20 The DNA sequence for the hARMP gene as cloned has been incorporated into 

a plasmid Bluescript. This stable vector has been deposited at ATCC under accession 

number on April 28, 1995L 

Several mutations in the hARMP gene have been identified which cause a 
severe type of familial Alzheimer's Disease, One, or a combination of these 

25 mutations may be responsible for this form of Alzheimer's Disease as well as several 
other neurological disorders. The mutations may be any form of nucleotide sequence 
alteration or substitution. Specific disease causing mutations in the form of nucleotide 
and/or amino acid substitutions have been located, although we anticipate additional 
mutations will be found in other families. Each of these nucleotide substitutions 

30 occurred within the putative ORF of the S182 transcript, and would be predicted to 
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change the encoded amino acid at the following positions, numbering from the first 
putative initiation codon. The mutations are listed in respect of their nucleotide 
locations in Sequence ID No: 1 and Sequence ID NO: 132 (an additional human 
clone) and amino acid locations in Sequence ID No: 2 and Sequence ID NO:135 ( the 



additional human clone). 








i) 685, A-C 


Met 


146 


Leu 


ii) 737, A->G 


His 


163 


Arg 


iii) 986, OA 


Ala 


246 


Glu 


iv) 1105, OG 


Leu 


286 


Val 


v) 1478, G-A 


Cys 


410 


Tyr 


vi) 1027, OT 


Ala 


260 


Val 


vii) 1102, C-*T 


Ala 


285 


Val 


viii) 1422, C->G 


Leu 


392 


Val 



The Metl46Leu, Ala246Glu and Cys410Tyr mutations have not been detected 
in the genomic DNA of affected members of the eight remaining small early onset 
autosomal dominant FAD pedigrees or six additional families in our collection which 
express late FAD onset. We predict that such mutations would not commonly occur 
in late onset FAD which has been excluded by genetic linkage studies from the more 
aggressive form of AD linked to chromosome 14q24.3 (St George-Hyslop, P et aL, 
1992; Schellenberg et aL, 1993). The Hisl63Arg mutation has been found in the 
genomic DNA of affected members of one additional FAD pedigree for which 
positive but significant statistical evidence for linkage to 14 becomes established. Age 
of onset of affected members was consistent with affected individuals from families 
linked to chromosome 14. 

Mutations Ala260Val, Ala285Val, and Leu392Val all occur within the acidic 
hydrophilic loop between putative transmembrane 6 (TM6) and transmembrane (TM7) 
(Figure 6). Two of the mutations (A260V; A285V) and the L286V mutation are also 
located in the alternative spliced domain. 

All eight of the mutations can be assayed by a variety of strategies (direct 
nucleotide sequencing, allele specific oligos, ligation polymerase chain reaction, 



22 



SSCP, RFLPs etc.) using RT-PCR products representing the mature mRNA/cDNA 
sequence or genomic DNA. Allele specific oligos were chosen for assaying the 
mutations. For the A260V and the A285V mutations, genomic DNA carrying the 
exon was amplified using the same PCR primers and methods as for the L286V 

5 mutation. PCR products were then denatured and slot blotted to duplicate nylon 
membranes using the slot blot protocol described for the C410T mutation. 

Of all of the nucleotide substitutions co-segregated with the disease in their 
respective pedigrees (figures 3a to 3e), none were seen in asymptomatic family 
members aged more than two standard deviations beyond the mean age of onset, and 

10 none were present on 284 chromosomes from unrelated neurologically normal 
subjects drawn from comparable ethnic origins. 

Identification of an Alternative Splice Form of the ARMP Gene Product 

During sequencing studies of RT-PCR products for the ARMP gene recovered 

15 from a variety of tissues, it was discovered that some peripheral tissues (principally 
white blood cells) demonstrated two alternative splice forms of the ARMP gene. One 
form is identical to the (putatively 467 amino acid) isoform constitutatively expressed 
in all brain regions. The alternative splice form results from the exclusion of the 
segment of the cDNA between base pairs 1018 to 1116 inclusive, and results in a 

20 truncated isoform of the ARMP protein wherein the hydrophobic part of the 
hydrophilic acidically-charged loop immediately C-terminal to TM6 is removed. This 
alternatively spliced isoform therefore is characterized by preservation of the sequence 
N-terminal to and including the tyrosine at position 256, changing of the aspartate at 
257 to alanine, and splicing on to the C-terminal part of the protein from and 

25 including tyrosine 291. Such splicing differences are often associated with important 
functional domains of the proteins. This argues that this hydrophilic loop (and 
consequently the N-tenrdnal hydrophillic loop with similar amino acid charge) is/are 
active functional domains of the ARMP product and thus sites for therapeutic 
targeting. 

30 
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ARMP Protein 

With respect to DNA SEQ ID NO. 1 and DNA SEQ ID NO: 132, analysis of 
the sequence of overlapping cDNA clones predicted an ORF protein of 467 amino 
acids when read from the first in phase ATO start codon and a molecular mass of 
5 approximately 52. 6 kDa as later described, due to either polymorphisms in the protein 
or alternate splicing of the transcript, the molecular weight of the protein can vary 
due to possible substitutions or deletions of amino acids. 

The analysis of predicted amino acid sequence using the Hopp and Woods 
algorithm suggested that the protein product is a multispanning integral membrane 

10 protein such as a receptor, a channel protein, or a structural membrane protein. The 
absence of recognizable signal peptide and the paucity of glycosylation sites are 
noteworthy, and the hydropathy profile suggests that the protein is less likely to be 
a soluble protein with a highly compact three-dimensional structure. 

The protein may be a cellular protein with a highly compact three dimensional 

15 structure in which respect is may be similar to APOE which is also related to 
Alzheimer's Disease, In light of this putative functional role, it is proposed that this 
protein be labelled as the Alzheimer Related Membrane Protein (ARMP). The 
protein also contains a number of potential phosphorylation sites, one of which is the 
consensus site for MAPkinase which is also involved in the hyperphosphorylation of 

20 tau during the conversion of normal tau to neurofibrillary tangles. This consensus 
sequence may provide a common putative pathway linking this protein and other 
known biochemical aspects of Alzheimer's Disease and would represent a likely 
therapeutic target. Review of the protein structure reveals two sequences YTPF 
(residues 115-119) and S1PE (residues 353 - 356) which represent the 5/T-P motif 

25 which is the MAP kinase consensus sequence. Several other phosphorylation sites 
exist with concensus sequences for Protein Kinase C activity. Because protein kinase 
C activity is associated with differences in the metabolism of APP which are relevant 
to Alzheimer* s Disease, these sites on the ARMP protein and homologues are sites 
for therapeutic targeting. 

30 The N-terminal is characterized by a highly hydrophilic acidic charged domain 
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with several potential phosphorylation domains, Mowed sequentially by a 
hydrophobic membrane spanning domain of 19 residues; a charged hydrophiiic loop, 
then five additional hydrophobic membrane spanning domains interspersed with short 
(5-20 residue) hydrophiiic domains; an additional larger acidic hydrophiiic charged 

5 loop, and then at least one and possibly two other hydrophobic potentially membrane 
spanning domains culminating in a polar domain at the C-terminus (Table 4 and 
Figure 6B), The presence of seven membrane spanning domains is characteristic of 
several classes of G-coupled receptor proteins but is also observed with other proteins 
including channel proteins. 

10 Comparison of the nucleic acid and predicted amino acid sequences with 

available databases using the BLAST alignment paradigms revealed amino acid 
similarity with the C elegans sperm integral membrane protein spe-4 and a similarity 
to several other membrane spanning proteins including mammalian chromogranin A 
and the a-subunit of mammalian voltage dependent calcium channels. 

15 The similarity between the putative products of the spe-4 and ARMP genes 

implies that they may have similar activities. The SPE-4 protein of G elegans 
appears to be involved in the formation and stabilization of the fibrous body- 
membrane organelle (FBMO) complex during spermatogenesis. The FBMO is a 
specialized Golgi-derived organelle, consisting of a membrane bound vesicle attached 

20 to and partly surrounding a complex of parallel protein fibers and may be involved 
in the transport and storage of soluble and membrane-bound polypeptides. Mutations 
in spe-4 disrupt the FBMO complexes and arrest spermatogenesis. Therefore the 
physiologic function of spe-4 may be either to stabilize interactions between integral 
membrane budding and fusion events, or to stabilize interactions between the 

25 membrane and fibrillary proteins during the intracellular transport of the FBMO 
complex during spermatogenesis. Comparable functions could be envisaged for the 
ARMP. The ARMP could be involved either in the docking of other membrane- 
bound proteins such as j3APP, or the axonal transport and fusion budding of 
membrane-bound vesicles during protein transport such as in the golgi apparatus or 

30 endosome-lysosome system. If correct, then mutations might be expected to result 
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in aberrant transport and processing of j3APP and/or abnormal interactions with 
cytoskeletal proteins such as the miootubule-associated protein Tau. Abnormalities 
in the intracellular and in the extracellular disposition of both jBAFP and Tau are in 
fact an integral part of the neuropathology features of Alzheimer's Disease. 

5 Although the location of the ARMP mutations in highly conserved residues within 
conserved domains of the putative proteins suggests that they are pathogenic, at least 
three of these mutations are conservative which is commensurate with the onset of 
disease in adult life. Because none of the mutations observed so for are deletions or 
nonsense mutations that would be expected to cause a loss of function, we cannot 

10 predict whether these mutations will have a dominant gain-of-function effect and 
promote aberrant processing of 0APP or a dominant loss-of-function effect causing 
arrest of normal 0APP processing. 

An alternative possibility is that the ARMP gene product may represent a 
receptor or channel protein. Mutations of such proteins have been causally related 

15 to several other dominant neurological disorders in both vertebrate (eg. Malignant 
hyperthermia, hyperkalemia periodic paralysis in humans) and in invertebrate 
organisms (deg-l(d) mutants in Cekgans). Although the pathology of these other 
disorders does not resemble that of Alzheimer's Disease there is evidence for 
functional abnormalities in ion channels in Alzheimer's Disease. For example, 

20 anomalies have been reported in the tetra-ethylammonium-sensitive 113pS potassium 
channel and in calcium homeostasis. Perturbations in transmembrane calcium fluxes 
might be especially relevant in view of the weak homology between S182 and the a- 
ID subunit of voltage-dependent calcium channels and the observations that increases 
in intracellular calcium in cultured cells can replicate some of the biochemical 

25 features of Alzheimer's Disease such as alteration in the phosphorylation of Tau- 
microtubule-associated protein and increased production of Af3 peptides. 

As mentioned purified normal ARMP protein is characterized by a molecular 
wtight of 52,6kDa. The normal ARMP protein, substantially free of other proteins, 
is encoded by the aforementioned SEQ. ID No, 1 and SEQ ID NO: 132. As will be 

30 later discussed, the ARMP protein and fragments thereof may be made by a variety 
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of methods. Purified mutant ARMP protein is characterized by FAD - associated 
phenotype (necrotic death, apoptic death, granulovascular degeneration, 
neurofibrillary degeneration, abnormalities or changes in the metabolism of APP, and 
Ca 2+ , K + , and glucose, and mitochondrial function and energy metabolism 
S neurotrasmitter metabolism, all of which have been found to be abnormal in human 
brain, and/or peripheral tissue ceils in subjects with Alzheimer's Disease) in a variety 
of cells, Hie mutant ARMP, free of other proteins, is encoded by the mutant DNA 
sequence* 

10 Description of the E5-1 gene, a homologuc of the ARMP gene 

A gene, £5-1, with substantial nucleotide and amino add homology to the 
ARMP gene was identified by using the nucleotide sequence of the cDNA for ARMP 
to search data bases using the BLASTN paradigm of Altschul et al. 1990. Three 
expressed sequence tagged sites (ESTs) identified by accession numbers T03796, 

15 R14600, and R05907 were located which had substantial homology (p < 1.0 e' 100 , 
greater than 97% identity over at least 100 contiguous base pain). 

Oligonucleotide primers were produced from these sequences and used to 
generate PCR products by reverse transcriptase PCR (RT-PCR). These short 
RT-PCR products were partially sequenced to confirm their identity with the 

20 sequences within the data base and were then used as hybridization probes to screen 
full-length cDNA libraries. Several different cDNA's ranging in size from 1 Kb to 
2.3 Kb were recovered from a cancer cell cDNA library (CaCo-2) and from a human 
brain cDNA library (E5-1, Gl-1, cc54, cc32). 

The nucleotide sequence of these clones confirmed that all were derivatives 

25 of the same transcript (designated E5- 1). 

The gene encoding the E5-1 transcript mapped to human chromosome 1 using 
hybrid mapping panels and to two clusters of CEPH Mega YAC clones which have 
been placed upon a physical contig map (YAC clones 750g7, 921dl2 mapped by 
FISH to lq41; and YAC clone 787gl2 which also contains an EST for the leukemia 

30 associated phosphoprotein {LAPIS) gene which has been mapped to lp36, l-p35) (data 
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not shown). 

Hybridization of the E5-1 cDNA clones to Northern Biota detected an -2.3 
Idlobase mRNA band in many tissues including regions of the brain, as well as a 
-2.6Kb mRNA band in muscle, cardiac muscle and pancreas (Figure 7). 

In skeletal muscle, cardiac muscle and pancreas, the E5-1 gene is expressed 
at relatively higher levels than in brain and as two different transcripts of -2.3 Kb 
and -2.6 Kb. Both of the E5-1 transcripts have sizes clearly distinguishable from 
that of the 2.7 Kb ARMP transcript, and did not cross-hybridize with ARMP probes 
at high stringency. The cDNA sequence of the E5-1 gene is identified as Sequence 
ID No. 136. 

The longest ORF within the E5-1 cDNA consensus nucleotide sequence 
predicts a polypeptide containing 448 amino acids (numbering from the first in-phase 
ATQ codon which was surrounded by a QCC-agg-fiCt-AIflrc Kozak consensus 
sequence) (Sequence ID No. 137). 

A comparison of the amino acid sequences of hARMP and E5-1 homologue 
protein are shown in Table 8. Identical residues are indicated by vertical lines. The 
locations of mutations in the E5-1 gene are indicated by downward pointing arrows. 
The locations of the mutations in the hARMP gene are indicated by upward pointing 
arrows. Putative TM domains are in open ended boxes. The alternatively spliced 
exons are denoted by superscripted (ES-1) or subscripted (hARMP) 

BLASTP alignment analyses also detected significant homology with SPE-4 
of C. elegans (P = 3.5e-26; identity = 20-63% over five domains of at least 22 
residues), and weak homologies to brain sodium channels (alpha III subunit) and to 
the alpha subunit of voltage dependent calcium channels from a variety of species (P 
■ 0.02; identities 20-28% over two or more domains each of at least 35 residues) 
(Altschul, 1990). These alignments are similar to those described above for the 
ARMP gene. However, the most striking homology to^the E5-1 protein was found 
with the amino acid sequence predicted for ARMP. ARMP and E5-1 proteins share 
63% overall amino acid sequence identity, and several domains display virtually 
complete identity (Table 8). Furthermore, all eight residues mutated in ARMP in 
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subjects with AD3 are conserved in the E5-1 protein (Table 8). As would be 
expected, hydrophobicity analyses suggest that both proteins also share a similar 
structural organization. 

The similarity was greatest in several domains of the protein corresponding 

5 to the intervals between transmembrane domain 1 (TM1) and TM6, and from TM7 
to the C-tenninus of the ARMP gene. The main difference from ARMP is a 
difference in the size and amino acid sequence of the acidically-charged hydrophilic 
loop in the .position 'equivalent to the hydrophilic loop between transmembrane 
domains TM6 and TM7 in the ARMP protein and in the sequence of the N-tcxmihal 

10 hydrophilic domains. 

Thus, both proteins are predicted to possess seven hydrophobic putative 
transmembrane domains, and both proteins bear large acidic hydrophilic domains at 
the N-terminus and between TM6 and TM7 (Figs. 6 and 8). A further similarity 
arose from analysis of RT-PCR products from brain and muscle RNA, which revealed 

15 that nucleotides 1153-1250 of the E5-1 transcript are alternatively spliced. These 
nucleotides encode amino adds 263-296, which are located within the TM6-TM7 loop 
domain of the putative E54 protein, and which share 94% sequence identity with the 
alternatively spliced residues 257-290 in ARMP, 

The most noticeable differences between the two predicted amino acid 

20 sequences occur in the amino acid sequence in the central portion of the TM6-+TM7 
hydrophilic loop (residues 304-374 of ARMP; 310 - 355 of E5-1), and in the N- 
terminal hydrophilic domain (Table 8). By analogy, this domain is also less highly 
conserved between the murine and human ARMP genes (identity = 47/60 residues), 
and shows no similarity with the equivalent region of SPE-4. 

25 A splice variant of the E5-1 cDNA sequence identified as Sequence ID No, 

136 has also been found in all tissues examined. This splice variant lacks the triplet 
GAA at nucleotide positions 1338-1340. 

A further variant has been found in one normal individual whose E5-1 cDNA 
had C replacing T at nucleotide position 626, which does not change the amino acid 

30 sequence. 
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Mutations of the E5- 1 gene associated with Alzheimer's Disease 

The strong similarity between ARMP and the ES-1 gene product raised the 
possibility that the E5-1 gene might be the site of disease-causing mutations in some 
of a small number of early onset AD pedigrees in which genetic linkage studies have 

5 excluded chromosomes 14, 19 and 21. RT-PCR was used to isolate cDNAs 
corresponding to the E5-1 transcript from lymphoblasts, fibroblasts or post-mortem 
brain tissue of affected members of eight pedigrees with early onset familial AD 
(FAD) in which mutations in the &APF and ARMP gene had previously been 
excluded by direct sequencing studies. 

10 Examination of these RT-PCR products detected a heterozygous A-*G 

substitution at nucleotide 1080 in all four affected members of an extended pedigree 
of Italian origin (FlolO) with early onset, pathologically confirmed FAD (onset = 50 
•70 yrs). This mutation would be predicted to cause a Met-*Val missense mutation 
at codon 239 (Table 8). 

15 A second mutation (A-»T at nucleotide 787) causing a Asn-»He substitution at 

codon 141 was found in affected members of a group of related pedigrees of Volga 
German ancestry (represented by cell lines AG09369, AG09907, AG09952, and 
AG09905, Coriell Institute, Camden NJ). Significantly, one subject (AG09907) was 
homozygous for this mutation, an observation compatible with the in-bred nature of 

20 these pedigrees. Significantly, this subject did not have a significantly different 
clinical picture from those subjects heterozygous for the Argl41He mutation. Neither 
of the E5-1 gene mutations were found in 284 normal Caucasian controls nor were 
they present in affected members of pedigrees with the AD3 type of AD. 

Both of these mutations would be predicted to cause substitution of residues 

25 which are highly conserved within the ABMP/E5-1 gene family. 

The finding of a gene whose product is predicted to share substantial amino 
acid and structural similarities with the ARMP gene product suggests that these 
proteins may be functionally related either as independent proteins with overlapping 
functions but perhaps with slightly different specific activities, as physically associated 

30 subunits of a multimeric polypeptide or as independent proteins performing 
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consecutive functions in the same pathway. 

The observation of two different missense mutations in conserved domains of 
the E5-1 protein in subjects with a familial form of AD argues that these mutations 
are, like those in the ARMP gene, causal to AD. This conclusion is significant 
because, while the disease phenotypes associated with mutations in the ARMP gene 
(onset 30-50yrs, duration 10 years) are subtly different from that associated with 
mutations in the E5-1 gene (onset 40-70 years; duration up to 20yrs), the general 
similarities clearly argue that the biochemical pathway subsumed by members of this 
gene family is central to the genesis of at least early onset AD. The subtle 
differences in disease phenotype may reflect a lower level of expression of the E5-1 
transcript in the CNS, or may reflect a different role for the E5~l gene product 

By analogy to the effects of ARMP mutations, E5-1 when mutated may cause 
aberrant processing of APP (Amyloid Precursor Protein) into A0 peptide, 
hyperphosphorylation of Tau microtubule associated protein and abnormalities of 
intracellular calcium homeostasis. Interference with these anomalous interactions 
provides a potential therapy for AD. 

Functional Domains nf the ARMP Protein are Defined bv Splicing Si t es, a nd 

similarities within Other Members of a Gene F amily 

The ARMP protein is a member of a novel class of transmembrane proteins 
which share substantial amino acid homology. The homology is sufficient that certain 
nucleotide probes and antibodies raised against one can identify other members of this 
gene family. The major difference between members of this family reside in the 
amino acid and nucleotide sequence homologous to the hydrophillic acid loop domain 
between putative transmembrane 6 and transmembrane 7 domains of the ARMP gene 
and gene product. This region is alternatively spliced in some non-neural tissues, and 
is also the site of several pathogenic disease-causing mutations in the ARMP gene. 
The variable splicing of this hydrophillic loop, the presence of a high-density of 
pathogenic mutations within this loop, and the fact that the amino acid sequences of 
the loop differs between members of the gene family suggest that this loop is an 
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important functional domain of the protein and may confer some specificity to the 
physiologic and pathogenic interactions which the ARMP gene product undergoes 
because the N-terminal hydrophillic domain shares the same acidic charge and same 
orientation with respect to the membrane, it is very likely that these two domains 

5 share functionality either in a coordinated (together) or independent fashion (eg. 
different ligands or functional properties). As a result everything said about the 
hydrophillic loop shall apply also to the N-tcrminal hydrophillic domain. 

Knowledge of the specificity of the loop can be used to identify ligands and 
functional properties of the ARMP gene product (eg. sites of interactions with AFP, 

10 cytosolic proteins such as kinases, Tau t and MAP, etc,). Soluble recombinant fusion 
proteins can be made or the nucleotide sequence coding for amino acids within the 
loop or parts of the loop can be expressed in suitable vectors (yeast-2-hybrid, 
baculovirus, and phage - display systems for instance), and used to identify other 
proteins which interact with ARMP in the pathogenesis of Alzheimer's disease and 

15 other neurological and psychiatric diseases. Therapies can be designed to modulate 
these interactions and thus to modulate Alzheimer's disease and the other conditions 
associated with acquired or inherited abnormalities of the ARMP gene or its gene 
products. The potential efficacy of these therapies can be tested by analyzing the 
affinity and function of these interactions after exposure to the therapeutic agent by 

20 standard pharmacokinetic measurements of affinity (Kd and Vmax etc) using synthetic 
peptides or recombinant proteins corresponding to functional domains of the ARMP 
gene (or its homologues). An alternate method for assaying the effect of any 
interactions involving functional domains such as the hydrophillic loop is to monitor 
changes in the intracellular trafficking and post-translational modification of the 

25 ARMP gene by in~situ hybridization, immunohistochemistry, Western blotting and 
metabolic pulse-chase labelling studies in the presence of and in the absence of the 
therapeutic agents. A third way is to monitor the effects of "downstream" events 
including (i) changes in the intracellular metabolism, trafficking and targeting of APP 
and its products; (ii) changes in second messenger event eg, cAMP, intracellular 

30 Ca ++ protein kinase activities, etc*. 
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Isolation and Purific ation of the ARMP Protein 

The ARMP protein may be isolated and purified by methods selected on the 
basis of properties revealed by its sequence. Since the protein possesses properties 
5 of a membrane-spanning protein, a membrane fraction of cells in which the protein 
is highly expressed (eg. central nervous system cells or cells from other tissues) 
would be isolated and the proteins removed by extraction and the proteins solubilized 
using a detergent. 

Purification can be achieved using protein purification procedures such as 
10 chromatography methods (gel-filtration, ion-exchange and immunoaffinity), by high- 
performance liquid chromatography (RP-HPLC, ion-exchange HPLC, size-exclusion 
HPLC, high-performance chromatofocusing and hydrophobic interaction 
chromatography) or by precipitation (immunoprecipitation), Polyacrylamide gel 
electrophoresis can also be used to isolate the ARMP protein based on its molecular 
15 weight, charge properties and hydrophobic^ 

Similar procedures to those just mentioned could be used to purify the protein 
from cells transfected with vectors containing the ARMP gene (eg. baculovirus 
systems, yeast expression systems, eukaryotic expression systems). 

Purified protein can be used in further biochemical analyses to establish 
20 secondary and tertiary structure which may aid in the design of pharmaceuticals to 
interact with the protein, alter protein charge configuration or charge interaction with 
other proteins, lipid or saccharide moities, alter its function in membranes as a 
transporter channel or receptor and/or in cells as an enzyme or structural protein and 
treat the disease, 

25 The protein can also be purified by creating a fusion protein by legating the 

ARMP cDNA sequence to a vector which contains sequence for another peptide (eg. 
GST - glutathionine sucdnyl transferase). The fusion protein is expressed and 
recovered from prokaryotic (eg, bacterial or baculovirus) or eukaryotic cells. The 
fusion protein can then be purified by affinity chromatography based upon the fusion 

30 vector sequence. The ARMP protein can then be further purified from the fusion 
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protein by enzymatic cleavage of the fusion protein. 

Isolating mouse ARMP gene 

In order to characterize the physiological significance of the normal and 
mutant hARMP gene and gene products in a transgenic mouse model it was necessary 
to recover a mouse homologue of the hARMP gene. We recovered a murine 
homologue for the hARMP gene by screening a mouse cDNA library with a labelled 
human DNA probe and in this manner recovered a 2 kb partial transcript 
(representing the 3' end of the gene) and several RT-PCR products representing the 
5'end. Sequencing of the concensus cDNA transcript of the murine homologue 
revealed substantial amino acid identity. The sequence cDNA is identified in 
Sequence ID No. 3 and the predicted amino add Sequence is provided in Sequence 
ID No. 4. Further sequencing of the mouse cDNA transcript has provided the 
sequence for the complete coding sequence identified as SEQ ID NO: 134 and the 
predicted amino acid sequence from this sequence is provided in SEQ ID NO: 135. 
More importantly, all of the amino acids that were mutated in the FAD pedigrees 
were conserved between the murine homologue and the normal human variant (Table 
3). This conservation of the ARMP gene as is shown in table 3, indicates that an 
orthologous gene exists in the mouse (mARMP), and it is now possible to clone 
mouse genomic libraries using human ARMP probes. This will also mate it possible 
to identify and characterize the ARMP gene in other species. This also provides 
evidence of animals with various disease states or disorders currently known or yet 
to be elucidated. 

Transgenic Mouse Model 

The creation of a mouse model for Alzheimer's Disease is important to the 
understanding of the disease and for the testing of possible therapies. Currendy no 
unambiguous viable animal model for Alzheimer's Disease exists. 

There are several ways in which to create an animal model for Alzheimer's 
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Disease. Generation of a specific mutation in the mouse gene such as the identified 
hARMP gene mutations is one strategy. Secondly, we could insert a wild type human 
gene and/or humanize the murine gene by homologous recombination. Thirdly, it is 
also possible to insert a mutant (single or multiple) human gene as genomic or 
5 minigene cDNA constructs using wild type or mutant or artificial promoter elements. 
Fourthly, knock-out of the endogenous murine genes may be accomplished by the 
insertion of artificially modified fragments of the endogenous gene by homologous 
recombination. The modifications include insertion of mutant stop codons, the 
deletion of DNA sequences, or the inclusion of recombination elements (lex p sites) 
10 recognized by enzymes such as Cre recombinase. 

To inactivate the mARMP gene chemical or x-ray mutagenesis of mouse 
gametes, followed by fertilization, can be applied. Heterozygous offspring can then 
be identified by Southern blotting to demonstrate loss of one allele by dosage, or 
failure to inherit one parental allele using RFLP markers. 
15 To create a transgenic mouse a mutant version of hARMP or mARMP can 

be inserted into a mouse germ line using standard techniques of oocyte microinjection 
or transfection or microinjection into stem cells. Alternatively, if it is desired to 
inactivate or replace the endogenous mARMP gene, homologous recombination using 
embryonic stem cells may be applied. 
20 For oocyte injection, one or more copies of the mutant or wild type ARMP 

gene can be inserted into the pronucleus of a just-fertilized mouse oocyte. This 
oocyte is then reimplanted into a pseudo-pregnant foster mother. The liveborn mice 
can then be screened for integrants using analysis of tail DNA for the presence of 
human ARMP gene sequences. The transgene can be either a complete genomic 
25 sequence injected as a YAC, BAC, PAC or other chromosome DNA fragment, a 
cDNA with either the natural promoter or a heterologous promoter, or a minigene 
containing all of the coding region and other elements found to be necessary for 
optimum expression. 

Retroviral infection of early embryos can also be done to insert the mutant or 
30 wild type hARMP. In this method, the mutant or wild type hARMP is inserted into 
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a retroviral vector which is used to directly infect mouse embryos during the early 
stages of development to generate a chimera, some of which will lead to germline 
transmission. Similar experiments can be conducted in the cause of mutant proteins, 
using mutant murine or other animal ARMP gene sequences. 

5 Homologous recombination using stem cells allows for the screening of gene 

transfer cells to identify the rare homologous recombination events. Once identified, 
these can be used to. generate chimeras by injection of mouse blastocysts, and a 
proportion of the resulting mice will show germline transmission from the 
recombinant line. This methodology is especially useful if inactivation of the 

10 mARMP gene is desired. For example, inactivation of the raAKMP gene can be done 
by designing a DNA fragment which contains sequences from a mARMP exon 
flanking a selectable marker. Homologous recombination leads to the insertion of the 
marker sequences in the middle of an exon, inactivating the mARMP gene. DNA 
analysis of individual clones can then be used to recognize the homologous 

15 recombination events. 

It is also possible to create mutations in the mouse germline by injecting 
oligonucleotides containing the mutation of interest and screening the resulting cells 
byPCR. 

This embodiment of the invention has the most significant commercial value 
20 as a mouse model for Alzheimer's Disease. Because of the high percentage of 
sequence conservation between human and mouse it is contemplated that an 
orthologous gene will exist also in many other species. It is thus contemplated that 
it will be possible to generate other animal models using similar technology. 

25 Screening and Diagnosis far Alzheimer's Disease 

nftnpral Diagnostic Uses of the AK MP fienc and Gene Product 

The ARMP gene and gene products will be useful for diagnosis of Alzheimer's 
disease, presenile and senile dementias, psychiatric diseases such as schizophrenia, 
depression, etc., and neurologic diseases such as stroke and cerebral hemorrhage - 

30 all of which are seen to a greater or lesser extent in symptomatic subjects bearing 
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mutations in the ARMP gene or in the APP gene. Diagnosis of inherited cases of 
these diseases can be accomplished by analysis of the nucleotide sequence (including 
genomic and cDNA sequences included in this patent). Diagnosis can also be achieved 
by monitoring alterations in the electrophoretic mobility and by the reaction with 
5 specific antibodies to mutant or wild-type ARMP gene products, and by functional 
assays demonstrating altered function of the ARMP gene product. In addition, the 
ARMP gene and ARMP gene products can be used to search for inherited anomalies 
in the gene and/or its products (as well as those of the homologous gene) and can also 
be used for diagnosis in the same way as they can be used for diagnosis of non- 
10 genetic cases. 

Diagnosis of non-inherited cases can be made by observation of alterations in 
the ARMP transcription, translation, and post-translational modification and 
processing as well as alterations in the intracellular and extracellular trafficking of 
ARMP gene products in the brain and peripheral cells. Such changes will include 
15 alterations in the amount of ARMP messenger RNA and/or protein, alteration in 
phosphorylation state, abnormal intracellular location/distribution, abnormal 
extracellular distribution, etc. Such assays will include: Northern Blots (with 
ARMP-specific and ARMP-non-specific nucleotide probes which also cross-react with 
other members of the gene family), and Western blots and enzyme-linked 
20 immunosorbent assays (ELISA) (with antibodies raised specifically to: ARMP; to 
various functional domains of ARMP; to other members of the homologous geae 
family ; and to various post-translational modification states including glycosylated and 
phosphorylated isoforms), These assays can be performed on peripheral tissues (eg, 
blood cells, plasma, cultured or other fibroblast tissues, etc,) as well as on biopsies 
25 of CNS tissues obtained antimortem or postmortem, and upon cerebrospinal fluid. 
Such assays might also include in-situ hybridization and immunohistochemistry (to 
localized messenger RNA and protein to specific subcellular compartments and/or 
within neuropathological structures associated with these diseases such as 
neurofibrillary tangles and amyloid plaques). 

30 
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Screening for Alzheimer's Disease 

Screening for Alzheimer's Disease as linked to chromosome 14 may now be 
readily carried out because of the knowledge of the mutations in the gene. 

People with a high risk for Alzheimer's Disease (present in family pedigree) 

5 or, individuals not previously known to be at risk, or people in general may be 
screened routinely using probes to detect the presence of a mutant ARMP gene by a 
variety of techniques. Genomic DNA used for the diagnosis may be obtained from 
body cells, such as those present in the blood, tissue biopsy, surgical specimen, or 
autopsy material, The DNA may be isolated and used directly for detection of a 

10 specific sequence or may be PCR amplified prior to analysis, RNA or cDNA may 
also be used* To detect a specific DNA sequence hybridization using specific 
oligonucleotides, direct DNA sequencing, restriction enzyme digest, RNase 
protection, chemical cleavage, and ligase-mediated detection axe all methods which 
can be utilized. Oligonucleotides specific to mutant sequences can be chemically 

15 synthesized and labelled radioactively with isotopes, or non-radioactivcly using biotin 
tags, and hybridized to individual DNA samples immobilized on membranes or other 
solid*supports by dot-blot or transfer from gels after electrophoresis. The presence 
or absence of these mutant sequences are then visualized using methods such as 
autoradiography, fluorometry, or colorimetric reaction. Examples of suitable PCR 

20 primers which are useful for example in amplifying portions of the subject sequence 
containing the aforementioned mutations are set out in Table 5. This table also sets 
out the change in enzyme site to provide a useful diagnostic tool as defined herein. 

Direct DNA sequencing reveals sequence differences between normal and 
mutant ABMP DNA. Cloned DNA segments may be used as probes to detect 

25 specific DNA segments. PCR can be used to enhance the sensitivity of this method. 
PCR is an enzymatic amplification directed by sequence-specific primers, and 
involves repeated cycles of heat denaturation of the DNA, annealing of the 
complementary primers and extension of the annealed primer with a DNA 
polymerase. This results in an exponential increase of the target DNA, 

30 Other nucleotide sequence amplification techniques may be used, such as 
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ligation-mediated PGR, anchored PCR and enzymatic amplification as would be 
understood by those skilled in the art. 

Sequence alterations may also generate fortuitous restriction enzyme 
recognition sites which are revealed by the use of appropriate enzyme digestion 
5 followed by gel-blot hybridization, DNA fragments carrying the site (normal or 
mutant) are detected by their increase or reduction in size, or by the increase or 
decrease of corresponding restriction fragment numbers. Genomic DNA samples may 
also be amplified by PCR prior to treatment with the appropriate restriction enzyme 
and the fragments of different sizes are visualized under UV light in the presence of 
10 ethidium bromide after gel electrophoresis. 

Genetic testing based on DNA sequence differences may be achieved by 
detection of alteration in electrophoretic mobility of DNA fragments in gels. Small 
sequence deletions and insertions can be visualized by high resolution gel 
electrophoresis. Small deletions may also be detected as changes in the migration 
15 pattern of DNA heteroduplexes in non-denaturing gel electrophoresis. Alternatively, 
a single base substitution mutation may be detected based on differential PCR product 
length in PCR. The PCR products of the normal and mutant gene could be 
differentially detected in acrylamide gels. 

Nuclease protection assays (SI or ligase-mediated) also reveal sequence 
20 changes at specific locations. 

Alternatively, to confirm or detect a polymorphism restriction mapping 
changes ligated PCR, ASO, REF-SSCP chemical cleavage, endonuclease cleavage at 
mismatch sites and SSCP may be used. Both REF-SSCP and SSCP are mobility shift 
assays which are based upon the change in conformation due to mutations. 
25 DNA fragments may also be visualized by methods in which the individual 

DNA samples are not immobilized on membranes. The probe and target sequences 
may be in solution or the probe sequence may be immobilized. Autoradiography, 
radioactive decay, spectrophotometry, and fluorometry may also be used to identify 
specific individual genotypes. Finally, mutations can be detected by direct nucleotide 
30 sequencing. 
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According to an embodiment of the invention, the portion of the cDNA or 
genomic DNA segment that is informative for a mutation, can be amplified using 
PGR. For example, the DNA segment immediately surrounding the C 410 Y 
mutation acquired from peripheral blood samples from an individual can be screened 
using the oligonucleotide primers 885 (tggagactggaacacaac) sequence ID No: 128 and 
893 (gtgtggccagggtagagaact) sequence ID No: 129. This region would then be 
amplied by PCR, the products separated by electrophoresis, and transferred to 
membrane, * Labelled oligonucleotide probes are then hybridized to the DNA 
fragments and autoradiography performed. 

ARMP Expression 

As an embodiment of the present invention, AMRP protein may be expressed 
using eukaryotic andprokaryotic expression systems, Eukaryotic expression systems 
can be used for many studies of the ARMP gene and gene product including 
determination of proper expression and post-translational modifications for full 
biological activity, identifying regulatory elements located in the 5* region of the 
ARMP gene and their role in tissue regulation of protein expression, production of 
large amounts of the normal and mutant protein for isolation and purification, to use 
cells expressing the ARMP protein as a functional assay system for antibodies 
generated against the protein or to test effectiveness of pharmacological agents, or as 
a component of a signal transduction system, to study the function of the normal 
complete protein, specific portions of the protein, or of naturally occurring and 
artificially produced mutant proteins. 

Eukaryotic and prokaryotic expression systems were generated using two 
different classes of ARMP nucleotide cDNA sequence inserts. In the first class, 
termed full-length constructs, the entire ARMP cDNA sequence is inserted into the 
expression plasmid in the correct orientation, and includes both the natural 5' UTR 
and 3' UTR sequences as well as the entire open reading frame. The open reading 
frames bear a nucleotide sequence cassette which allows either the wild type open 
reading frame to be included in the expression system or alternatively, single or a 
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combination of double mutations can be inserted into the open reading frame. This 
was accomplished by removing a restriction fragment from the wild type open reading 
frame using the enzymes Narl and PflmI and replacing it with a similar fragment 
generated by reverse transcriptase PGR and which bears the nucleotide sequence 
5 encoding either the Metl46Leu mutation or the Hysl63Arg mutation. A second 
restriction fragment was removed from the wild type normal nucleotide sequence for 
the open reading frame by cleavage with the enzymes PflmI and Ncol and replaced 
with restriction fragments bearing either the nucleotide sequence encoding the 
Ala246Glu mutation, or the Ala260Val mutation or the Ala285Val mutation or the 
10 Leu286Val mutation, or the Leu392Val mutation, or the Cys410Tyr mutation, 
Finally, a third variant bearing combinations of either the Metl46Leu or Hisl63Arg 
mutations in tandem with the remaining mutations, was made by linking the Narl- 
PflmI fragment bearing these mutations and the Pflml-Ncol fragments bearing the 
remaining mutations. 

15 A second variant of cDNA inserts bearing wild type or mutant cDNA 

sequences was constructed by removing from the full-length cDNA the 5' UTR and 
part of the 3* UTR sequences. The 5' UTR sequence was replaced with a synthetic 
oligonucleotide containing a Kpnl restriction site and a Kozak initiation site 
(oligonucleotide 969; ggtaccgccaccatgacagaggtacctgcac, Sequence ID No: 138). The 

20 3* UTR was replaced with an oligonucleotide corresponding to position 2566 of the 
cDNA and bears an artificial BcoRI site (oligonucleotide 
970:gaattcactggctgtagaaaaagac, Sequence ID No: 139). Mutant variants of this 
construct were then made by inserting the same mutant sequences described above at 
the Narl-Pflml fragment, and at the Pslml-Ncol sites described above. 

25 For eukaryotic expressions, these various cDNA constructs bearing wild type 

and mutant sequences described above were cloned into the expression vector pZeoS V 
(invitrogen). For prokaryotic expression, two constructs have been made using the 
glutathione S-transferase fusion vector pGEX-kg. The inserts which have been 
attached to the GST fusion nucleotide sequence are the same nucleotide sequence 

30 described above (generated with the oligonucleotide primers 969, Sequence ID 
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No; 138 and 970, Sequence ID No: 139) bearing either the normal open reading frame 
nucleotide sequence, or bearing a combination of single and double mutations as 
described above. This construct allows expression of the full-length protein in mutant 
and wild type variants in prokaryotic cell systems as a GST fusion protein which 
5 allows purification of the full-length protein followed by removal of the GST fusion 
product by thrombin digestion. The second prokaryotic cDNA construct was 
generated to create a fusion protein with the same vector, and allows the production 
of the amino acid sequence corresponding to the hydrophillic acidic loop domain 
between TM6 and TM7 of the full-length protein, as either a wild type nucleotide 
10 sequence (thus a wild type amino acid sequence for fusion proteins) or as a mutant 
sequence bearing either the Ala285Val mutation, or the Leu286Val mutation, or the 
Leu392Val mutation. This was accomplished by recovering wild type or mutant 
sequence from appropriate sources of RNA using the oligonucleotide primers 
989:ggatccggtccacttcgtatgctg, Sequence ID No: 140, and 
15 990:ttttttgaattcttaggctatggttgtgttcca, Sequence ID No: 141. This allows cloning of the 
appropriate mutant or wild type nucleotide sequence corresponding to the hydrophillic 
acid loop domain at the BamHI and the EcoRI sites within the pGEX-KG vector. 

These prokaryotic expression systems allow the holo-protein or various 
important functional domains of the protein to be recovered as fusion proteins and 
20 then used for binding studies, structural studies, functional studies, and for the 
generation of appropriate antibodies. 

Expression of the ARMP gene in heterologous cell systems can be used to 
demonstrate structure-function relationships. Iigating the ARMP DNA sequence into 
a plasmid expression vector to transfect cells is a useful method to test the proteins 
25 influence on various cellular biochemical parameters. Plasmid expression vectors 
containing either the entire, normal or mutant human or mouse ARMP sequence or 
portions thereof, can be used in in vitro mutagenesis experiments which will identify 
portions of the protein crucial for regulatory function. 

The DNA sequence can be manipulated in studies to understand the expression 
30 of the gene and its product, to achieve production of large quantities of the protein 
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for functional analysis, for antibody production, and for patient therapy, The changes 
in the sequence may or may not alter the expression pattern in terms of relative 
quantities, tissue-specificity and functional properties. Partial or full-length DNA 
sequences which encode for the ARMP protein, modified or unmodified, may be 

5 ligated to bacterial expression vectors. & coli can be used using a variety of 
expression vector systems, eg. the 17 RNA polymerase/promoter system using two 
plasmids or by labeling of plasmid-encoded proteins, or by expression by infection 
with M13 Phage mGPI-2. R coli vectors can also be used with Phage lamba 
regulatory sequences, by fusion protein vectors (eg. lacZ and trpE), by maltose- 

10 binding protein fusions, and by glutathione-S-transferase fusion proteins, etc, , all of 
which together with many other prokaryotic expression systems are widely available 
commercially. 

Alternatively, the ARMP protein can be expressed in insect cells using 
baculoviral vectors, or in mammalian cells using vaccinia virus or specialised 

15 eukaryotic expression vectors. For expression in mammalian cells, the cDNA 
sequence may be ligated to heterologous promoters, such as the simian virus (SV40) 
promoter in the pSV2 vector and other similar vectors and introduced into cultured 
eukaryotic cells, such as COS cells to achieve transient or long-term expression. The 
stable integration of the chimeric gene construct may be maintained in mammalian 

20 cells by biochemical selection, such as neomycin and mycophoenolic acid. 

The ARMP DNA sequence can be altered using procedures such as restriction 
enzyme digestion, fill-in with DNA polymerase, deletion by exonuclease, extension 
by terminal deoxynucleotide transferase, ligation of synthetic or cloned DNA 
sequences and site-directed sequence alteration with the use of specific 

25 oligonucleotides together with PGR. 

The cDNA sequence or portions thereof, or a mini gene consisting of a cDNA 
with an intron and its own promoter, is introduced into eukaryotic expression vectors 
by conventional techniques. These vectors permit the transcription of the cDNA in 
eukaryotic cells by providing regulatory sequences that initiate and enhance the 

30 transcription of the cDNA and ensure its proper splicing and polyadenylation. The 
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endogenous ARMP gene promoter can also be used. Different promoters within 
vectors have different activities which alters the level of expression of the cDNA. In 
addition, certain promoters can also modulate function such as the glucocorticoid- 
responsive promoter from the mouse mammary tumor virus. 

5 Some of the vectors listed contain selectable markers or neo bacterial genes 

that permit isolation of cells by chemical selection. Stable long-term vectors can be 
maintained in cells as episomal, freely replicating entities by using regulatory 
dements of viruses. Cell lines can also be produced which have integrated the vector 
into the genomic DNA. In this manner, the gene product is produced on a continuous 

10 basis. 

Vectors are introduced into recipient cells by various methods including 
calcium phosphate, strontium phosphate, electroporation, lipofection, DEAE dextran, 
microinjection, or by protoplast fusion. Alternatively, the cDNA can be introduced 
by infection using viial vectors. 

15 Using the techniques mentioned, the expression vectors containing the ARMP 

gene or portions thereof can be introduced into a variety of mammalian cells from 
other species or into non-mammalian cells. 

The recombinant cloning vector, according to this invention, comprises the 
selected DNA of the DNA sequences of this invention for expression in a suitable 

20 host. The DNA is opcratively linked in the vector to an expression control sequence 
in the recombinant DNA molecule so that normal and mutant ARMP protein can be 
expressed. The expression control sequence may be selected from the group 
consisting of sequences that control the expression of genes of prokaryotic or 
cukaryotic ceils and their viruses and combinations thereof. The expression control 

25 sequence may be selected from the group consisting of the lac system, the tip system, 
the tac system, the trc system, major operator and promoter regions of phage lambda, 
the control region of the fd coat protein, early and late promoters of SV40, promoters 
derived from polyoma, adenovirus, retrovirus, baculovirus, simian virus, 3- 
phosphoglyccrate kinase promoter, yeast acid phosphatase promoters, yeast alpha- 

30 mating factors and combinations thereof. 
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The host cell which may be transfected with the vector of this invention may 
be selected from the group consisting of E.coli, pseudomonas, bacillus subtillus, 
bacillus stearothermophilus, or other bacili; other bacteria, yeast, fungi, insect, mouse 
or other animal, plant hosts, or human tissue cells, 
5 For the mutant ARMP DNA sequence similar systems are employed to express 

and the produce the mutant protein. 

Antibodies to Detect ARMP 

Antibodies to epitopes with the ARMP protein can be raised to provide 

10 information on the characteristics of the proteins. Generation of antibodies would 
enable the visualization of the protein in cells and tissues using Western blotting. In 
this technique, proteins are run on polyacrylamide gel and then transferred onto 
nitrocellulose membranes. These membranes are then incubated in the presence of 
the antibody (primary), then following washing are incubated to a secondary antibody 

15 which is used for detection of the protein-primary antibody complex. Following 
repeated washing, the entire complex is visualized using colourimetric or 
chemiluminescent methods. 

Antibodies to the ARMP protein also allow for the use of 
immunocytochemistry and immunofluorescence techniques in which the proteins can 

20 be visualized directly in cells and tissues. This is most helpful in order to establish 
the subcellular location of the protein and the tissue specificity of the protein. 

In order to prepare polyclonal antibodies, fusion proteins containing defined 
portions or all of the ARMP protein can be synthesized in bacteria by expression of 
corresponding DNA sequences in a suitable cloning vehicle* The protein can then be 

25 purified, coupled to a carrier protein and mixed with Freund's adjuvant (to help 
stimulate the antigenic response by the rabbits) and injected into rabbits or other 
laboratory animals. Alternatively, protein can be isolated from cultured cells 
expressing the protein. Following booster injections at bi-weekly intervals, the 
rabbits or other laboratory animals are then bled and the sera isolated. The sera can 

30 be used directly or purified prior to use, by various methods including affinity 
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chromatography, Protein A-Sepharose, Antigen Sepharose, Anti-mouse-Ig-Sepharose. 
The sera can then be used to probe protein extracts run on a polyacrylamide gel to 
identify the ARMP protein. Alternatively, synthetic peptides can be made to the 
antigenic portions of the protein and used to innoculate the animals* 

5 To produce monoclonal ARMP antibodies, cells actively expressing the protein 

are cultured or isolated from tissues and the cell membranes isolated. The 
membranes, extracts, or recombinant protein extracts, containing the ARMP protein, 
are injected in Freund's adjuvant into mice. After being injected 9 times over a three 
week period, the mice spleens are removed and resuspended in phosphate buffered 

10 saline (PBS). The spleen cells serve as a source of lymphocytes, some of which are 
producing antibody of the appropriate specificity. These are then fused with a 
permanently growing myeloma partner cell, and the products of the fusion are plated 
into a number of tissue culture wells in the presence of a selective agent such as 
HAT. The wells are then screened to identify those containing cells making useful 

15 antibody by EUSA. These are then freshly plated. After a period of growth, these 
wells are again screened to identify antibody-producing cells. Several cloning 
procedures are carried out until over 90% of the wells contain single clones which 
are positive for antibody production. From this procedure a stable line of clones is 
established which produce the antibody. The monoclonal antibody can then be 

20 purified by affinity chromatography using Protein A Sepharose, ion-exchange 
chromatography, as well as variations and combinations of these techniques. 

In situ hybridization is another method used to detect the expression of ARMP 
protein. In situ hybridization relies upon the hybridization of a specifically labelled 
nucleic acid probe to the cellular RNA in individual cells or tissues. Therefore, it 

25 allows the identification of mRNA within intact tissues, such as the brain. In this 
method, oligonucleotides corresponding to unique portions of the ARMP gene are 
used to detect specific mRNA species in the brain. 

In this method a rat is anesthetized and transcardially perfused with cold PBS, 
followed by perfusion with a formaldehyde solution. The brain or other tissues is 

30 then removed, frozen in liquid nitrogen, and cut into thin micron sections. The 
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sections arc placed on slides and incubated in proteinase K. Following rinsing in 
DEP, water and ethanol, the slides are placed in prehybridization buffer, A 
radioactive probe corresponding to the primer is made by nick translation and 
incubated with the sectioned brain tissue. After incubation and air drying, the labeled 
5 areas are visualized by autoradiography. Dark spots on the tissue sample indicate 
hybridization of the probe with brain mRNA which demonstrates the expression of 
the protein. 

Antibodies may also be used coupled to compounds for diagnostic and/or 
therapeutic uses such as radionuclides for imaging and therapy and liposomes for the 
10 targeting of compounds to a specific tissue location. 

Isolation and Purification of £5-1 protein 

The E5-1 protein may be isolated and purified by the types of methods 
described above for the ARMP protein. 

15 The protein may also be prepared by expression of the E5-1 cDNA described 

herein in a suitable host. The protein is preferably expressed as a fusion protein by 
ligating its encoding cDNA sequence to a vector containing the coding sequence for 
another suitable peptide, eg, GST, The fusion protein is expressed and recovered 
from prokaryotic cells such as bacterial or baculovirus cells or from eukaryotic cells. 

20 Antibodies to ARMP, by virtue of portions of amino acid sequence identity with E5- 
1, can be used to purify, attract and bind to E5-1 protein and vice versa. 

Transgenic Mouse Model nf ES-1 related Alzheimer's Disease 

An animal model of Alzheimer's disease related to mutations of the ES-1 gene 
25 may be created by methods analogous to those described above for the ARMP gene. 

Antibodies 

Due to its structural similarity with the ARMP, the E5-1 protein may be used 
for the development of probes, peptides, or antibodies to various peptides within it 
30 which may recognize both the E5-1 and the ARMP gene and gene products, 
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respectively. As a protein homologue for the ARMP, the E5-1 protein may be used 
as a replacement for a defective ARMP gene product. It may also be used to 
elucidate functions of the ARMP gene in tissue culture and vice versa. 

5 Screening for Alzhrimgr's Piseass linked to Chromosome 1 

Screening for Alzheimer's Disease linked to mutations of the £J-i gene may 
now be conveniently carried out. 

General screening methods are described above in relation to the described 
mutations in the ARMP gene. These described methods can be readily applied and 
10 adapted to detection of the described chromosome 1 mutations, as will be readily 
understood by those skilled in the art. 

In accordance with one embodiment of the invention, the Asnl41Ile mutation 
is screened for by PCR amplification of the surrounding DNA fragment using the 
primers: 

15 1041: 5 '-cattcactgaggacacacc (end-labelled) and 

1042: 5*-tgtagagcaccaccaaga (unlabelled) 

Any tissue with nucleated cell may be examined. The amplified products are 
separated by electrophoresis and an autoradiogram of the gel is prepared and 
examined for mutant bands. 
20 In accordance with a further embodiment, the Met239Val mutation is screened 

for by PCR amplification of its surrounding DNA fragment using the primers: 

1034: S'-gcatggtgtgcatccact and 

1035: 5'-ggaccaetctgggaggta 

The amplified products are separated and an autoradiogram prepared as 
25 described above to detect mutant bands. 

The same primer sets may be used to detect the mutations by means of other 
methods such as SSCP, chemical cleavage, DGGE, nucleotide sequencing, ligation 
chain reaction and allele specific oligonucleotides. As will be understood by those 
skilled in the art, other suitable primer pairs may be devised and used. 
30 In inherited cases, as the primary event, and in non-inherited cases as a 
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secondary event due to the disease state, abnormal processing of E5-1, ARMP, APP 
or proteins reacting with E5-1, APP or ARMP, may occur. This can be detected as 
abnormal phosphorylation, glycosylation, glycation amidation or proteolytic cleavage 
products in body tissues or fluids, eg, CSF or blood. 

5 

Therapies 

An important aspect of the biochemical studies using the genetic information 
of this invention is the development of therapies to circumvent or overcome the 
ARMP gene defect, and thus prevent, treat, control serious symptoms or cure the 

10 disease. In view of expression of the ARMP gene in a variety of tissues, one has to 
recognize that Alzheimer's Disease may not be restricted to the brain. Alzheimer's 
Disease manifests itself as a neurological disorder which in one of its forms is caused 
by a mutation in the ARMP gene, but such manifest may be caused by the mutations 
in other organ tissues, such as the liver, releasing factors which affect the brain 

15 activity and ultimately cause Alzheimer's Disease. Hence, in considering various 
therapies, it is understood that such therapies may be targeted at tissue other than the 
brain, such as heart, placenta, lung, liver, skeletal muscle, kidney and pancreas, 
where ARMP is also expressed. 

The effect of these mutations in E5-1 and ARMP is a gain of a novel function 

20 which causes aberrant processing of (APP) Amyloid Precursor Protein into Aj3 
peptide, abnormal phosphorylation homeostasis, and abnormal apoptosis. Therapy 
to reverse this will be small molecules (drugs) recombinant proteins, etc. which block 
the aberrant function by altering the structure of the mutant protein, enhancing its 
metabolic clearance or inhibiting binding of ligands to the mutant protein, or 

25 inhibiting the channel function of the mutant protein- The same effect might be 
gained by inserting a second mutant protein by gene therapy similar to the correction 
of the "Deg 1(d)* and "Mec 4(d)" mutations in C elegant by insertion of mutant 
transgenes. Alternately overexpression of wild type E5-1 protein or wild type ARMP 
or both may correct the defect. This could be the administration of drugs or proteins 

30 to induce the transcription and translation or inhibit the catabolism of the native E5-1 
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and ARMP proteins. It could also be accomplished by infusion of recombinant 
proteins or by gene therapy with vectors causing expression of the normal protein at 
a high level. 

5 Rationale for Therap eutic. Dia gnostic, and Inv estigational Applications of the ARMP 
n^e and flene Produ cts as Thev Relate to the Amvloid Precursor Protein 

The A0 peptide derivatives of APP are neurotoxic (Selkoe et al, 1994). APP 
is metabolized by passages through the Golgi network and then to secretory pathways 
via clathrin-coated vesicles with subsequent passage to the plasma membrane where 

10 the mature APP is cleaved by a-secretase to a soluble fraction (Protease Nexin II) 
plus a non-amyloidogenic C-terminal peptide (Selkos et al. 1995, Gandy et al. 1993). 
Alternatively, mature APP can be directed to the endosome-lysosome pathway where 
it undergoes beta and gamma secretase cleavage to produce the A/3 peptides. The 
phosphorylation state of the cell determines the relative balance of a-secretase 

15 (non-amyloidogenic) or A0 pathways (amyloidogenic pathway) (Gandy et al. 1993) . 
The phosphorylation state of the cell can be modified pharmacologically by phorbol 
esters, muscarinic agonists and other agents, and appears to be mediated by cytosolic 
factors (especially protein kinase Q acting upon an integral membrane protein in the 
Golgi network, which we propose to be the ARMP, and members of the homologous 

20 family (all of which carry several phosphorylation consensus sequences for protein 
Idnease Q. Mutations in the ARMP gene will cause alterations in the structure and 
function of the ARMP gene product leading to defective interactions with regulatory 
elements (eg. protein kinase C) or with APP, thereby promoting APP to be directed 
to the amyloidogenic endosome-lysosome pathway. Environmental factors (viruses, 

25 toxins, and aging etc) may also have similar effects on ARMP. To treat Alzheimer's 
disease, the phosphorylation state of ARMP can be altered by chemical and 
biochemical agents (eg. drugs, peptides and other compounds) which alter the activity 
of protein kinase C and other protein kinases, or which alter the activity of protein 
phosphatases, or which modify the availability of ARMP to be postranslationally 

30 modified. The interactions between kinases and phosphatases with the ARMP gene 
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products (and the products of its homologues), and the interactions of the ARMP gene 
products with other proteins involved in the trafficking of APP within the Golgi 
network can be modulated to decrease trafficking of Golgi vesicles to the 
endosome-lysosome pathway thereby promoting A(3 peptide production. Such 
5 compounds will include: peptide analogues of APP, ARMP, and homologues of 
ARMP as well as other interacting proteins, lipids, sugars, and agents which promote 
differential glycosylation of ARMP and its homologues; agents which alter the 
biologic half-life of messenger RNA or protein of ARMP and homologues including 
antibodies and antiscnsc oligonucleotides; and agents which act upon ARMP 

10 transcription. 

The effect of these agents in cell lines and whole animals can be monitored 
by monitoring: transcription; translation; post-translationai modification of ARMP (eg 
phosphorylation or glycosylation); and intracellular trafficking of ARMP and its 
homologues through various intracellular and extracellular compartments. Methods 

15 for these studies include Western and Northern blots; immunoprecipitation after 
metabolic labelling (pulse-chase) with radio-labelled methionine and ATP, and 
immunohistochemistry. The effect of these agents can also be monitored using 
studies which examine the relative binding affinities and relative amounts of ARMP 
gene products involved in interactions with protein kinease C and/or APP using either 

20 standard binding affinity assays or co-precipitation and Western blots using antibodies 
to protein kinease C, APP or ARMP and its homologues. The effect of these agents 
can also be monitored by assessing the production of A/J peptides by ELISA before 
and after exposure to the putative therapeutic agent (Huang et al. 1993). The effect 
can also be monitored by assessing the viability of cell lines after exposure to 

25 aluminum salts and to A|3 peptides which are thought to be neurotoxic in Alzheimer's 
disease. Finally, the effect of these agents can be monitored by assessing the 
cognitive function of animals bearing: their normal genotype at APP or ARMP 
homologues; or bearing human APP transgenes (with or without mutations); or 
bearing human ARMP transgenes (with or without mutations); or a combination of 

30 all of these. 
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PftHonflle for Therapeutic. Diagno stic, and Investigational Applications of the ARMP 
fone T the £5-7 pene and their products 

The ARMP geae product and the £5-2 gene product have amino acid sequence 
homology to human ion channel proteins and receptors. For instance, the E5-1 

5 protein shows substantial homology to the human sodium channel ot-subunit (E=0. 18, 
P=(U6, identities = 22 - 27% over two regions of at least 35 amino acid residues) 
using the BLASTP paradigm of Altschul et al 1990, Other diseases (such as 
malignant hyperthermia and hyperkalemic periodic paralysis in humans and the 
neurodegenerative of mechanosensory neurons in C elegans) arise through mutations 

10 in ion channels or receptor proteins. Mutation of the ARMP gene or the E5-1 gene 
could affect similar functions and lead to Alzheimer's disease and other psychiatric 
and neurological diseases. Based upon this, a test for Alzheimer's disease can be 
produced to detect an abnormal receptor or an abnormal ion channel function related 
to abnormalities that are acquired or inherited in the ARMP gene and its product, or 

15 in one of the homologous genes such as £5-2 and their products. This test can be 
accomplished cither in vivo or in vitro by measurements of ion channel fluxes and/or 
transmembrane voltage or current fluxes using patch clamp, voltage clamp and 
fluorescent dyes sensitive to intracellular calcium or transmembrane voltage. 
Defective ion channel or receptor function can also be assayed by measurements of 

20 activation of second messengers such as cyclic AMP, cGMP tyrosine kinases, 
phosphates, increases in intracellular Ca 3 * levels, etc. Recombinant^ made proteins 
may also be reconstructed in artificial membrane systems to study ion channel 
conductance. Therapies which affect Alzheimer's disease (due to acquired/inherited 
defects in the ARMP gene or £5-2 gene; due to defects in other pathways leading to 

25 this disease such as mutations in APP; and due to environmental agents) can be tested 
by analysis of their ability to modify an abnormal ion channel or receptor function 
induced by mutation in the ARMP gene or in one of its homologucs. Therapies could 
also be tested by their ability to modify the normal function of an ion channel or 
receptor capacity of the ARMP gene products and its homologucs. Such assays can 

30 be performed on cultured cells expressing endogenous normal or mutant ARMP 
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genes/gene products or £5-2 genes/gene products. Such studies can be performed in 
addition on ceils transfected with vectors capable of expressing ARMP, parts of the 
ARMP gene and gene product, mutant ARMP, E5-1 gene, parts of the E5-1 gene and 
gene product, mutant E5-1 gene or another homologue in normal or mutant form. 
Therapies for Alzheimer's disease can be devised to modify an abnormal ion channel 
or receptor function of the ARMP gene or E5-1 gene. Such therapies can be 
conventional drugs, peptides, sugars, or lipids, as well as antibodies or other ligands 
which affect the properties of the ARMP or E5-1 gene product* Such therapies can 
also be performed by direct replacement of the ARMP gene and/or E5-1 gene by gene 
therapy. In the case of an ion channel, the gene therapy could be performed using 
either mini-genes (cDNA plus a promoter) or genomic constructs bearing genomic 
DNA sequences for parts or all of the ARMP gene. Mutant ARMP or homologous 
gene sequences might also be used to counter the effect of the inherited or acquired 
abnormalities of the ARMP gene as has recently been done for replacement of the 
mec 4 and deg 1 in Celegans (Huang and Chalfie, 1994), The therapy might also 
be directed at augmenting the receptor or ion channel function of the homologous 
genes such as the E5-1 gene, in order that it may potentially take over the functions 
of the ARMP gene rendered defective by acquired or inherited defects. Therapy 
using antisense oligonucleotides to block the expression of the mutant ARMP gene 
or the mutant E5-1 gene, coordinated with gene replacement with normal ARMP or 
E5-1 gene can also be applied using standard techniques of either gene therapy or 
protein replacement therapy* 

Protein Therapy 

Treatment of Alzheimer's Disease can be performed by replacing the mutant 
protein with normal protein, or by modulating the function of the mutant protein. 
Once the biological pathway of the ARMP protein has been completely understood, 
it may also be possible to modify the pathophysiologic pathway (eg, a signal 
transduction pathway) in which the protein participates in order to correct the 
physiological defect. 
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To replace the mutant protein with normal protein, or with a protein bearing 
a deliberate counterbalancing mutation it is necessary to obtain large amounts of pure 
ARMP protein or E5-1 protein from cultured cell systems which can express the 
protein. Delivery of the protein to the affected brain areas or other tissues can then 
5 be accomplished using appropriate packaging or administrating systems* 

Gene Therapy 

Genc therapy is another potential therapeutic approach in which normal copies 
of the ARMP gene are introduced into patients to successfully code for normal protein 

10 in several different affected cell types. The gene must be delivered to those cells in 
a form in which it can be taken up and code for sufficient protein to provide effective 
function. Alternatively, in some neurologic mutants it has been possible to prevent 
disease by introducing another copy of the homologous gene bearing a second 
mutation in that gene or to alter the mutation, or use another gene to block its effect, 

15 Retroviral vectors can be used for somatic ceil gene therapy especially because 

of their high efficiency of infection and stable integration and expression. The 
targeted cells however must be able to divide and the expression of the levels of 
normal protein should be high because the disease is a dominant one. The full length 
ARMP gene can be cloned into a retroviral vector and driven from its endogenous 

20 promoter or from the retroviral long terminal repeat or from a promoter specific for 
the target cell type of interest (such as neurons). 

Other viral vectors which can be used include adeno-associated virus, vaccinia 
virus, bovine papilloma virus, or a herpesvirus such as Epstcin-Barr virus. 

Gene transfer could also be achieved using non-viral means requiring infection 

25 in vitro. This would include calcium phosphate, DEAE dextran, electroporation, and 
protoplast fusion. liposomes may also be potentially beneficial for delivery of DNA 
into a cell. Although these methods are available, many of these are lower 
efficiency. 

Antisense based strategies can be employed to explore ARMP gene function 
30 and as a basis for therapeutic drug design. ITie principle is based on the hypothesis 
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that sequence-specific suppression of gene expression can be achieved by intracellular 
hybridization between mRNA and a complementary antisense species. The formation 
of a hybrid RNA duplex may then interfere with the processing/transport/translation 
and/or stability of the target ARMP mRNA. Hybridization is required for the 

5 antisense effect to occur, however the efficiency of intracellular hybridization is low 
and therefore the consequences of such an event may not be very successful, 
Antisense strategies may use a variety of approaches including the use of antisense 
oligonucleotides, injection of antisense RNA and transfection of antisense RNA 
expression vectors. Antisense effects can be induced by control (sense) sequences, 

10 however, the extent of phenotypic changes are highly variable, Phenotypic effects 
induced by antisense effects are based on changes in criteria such as protein levels, 
protein activity measurement, and target mRNA levels. Multidrug resistance is a 
useful model to study molecular events associated with phenotypic changes due to 
antisense effects, since the multidrug resistance phenotype can be established by 

15 expression of a single gene mdrl(MDR gene) encoding for P-glycoprotein. 

Transplantation of normal genes into the affected area of the patient can also 
be useful therapy for Alzheimer's Disease, In this procedure, a normal hARMP 
protein is transferred into a cultivatable cell type such as glial cells, either 
exogenously or endogenously to the patient. These cells are then injected 

20 serotologically into the disease affected tissue(s). This is a known treatment for 
Parkinson's disease. 

Immunotherapy is also possible for Alzheimer's Disease. Antibodies can be 
raised to a mutant ARMP protein (or portion thereof) and then be administered to 
bind or block the mutant protein and its deliterious effects. Simultaneously, 

25 expression of the normal protein product could be encouraged. Administration could 
be in the form of a one time immunogenic preparation or vaccine immunization. An 
immunogenic composition may be prepared as injectables, as liquid solutions or 
emulsions. The ARMP protein may be mixed with pharmaceutically acceptable 
exdpients compatible with the protein. Such excipients may include water, saline, 

30 dextrose, glycerol, ethanol and combinations thereof. The immunogenic composition 
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and vaccine may further contain auxiliary substances such as emulsifying agents or 
adjuvants to enhance effectiveness. Immunogenic compositions and vaccines may be 
administered parenteially by injection subcutaneously or intramuscularly, 

The immunogenic preparations and vaccines are administered in such amount 
as will be therapeutically effective, protective and immunogenic. Dosage depends on 
the route of administration and will vary according to the size of the host 

Similar gene therapy techniques may be employed with respect to the E5-i 

gene. 

The above disclosure generally describes the present invention, A more 
complete understanding can be obtained by reference to the following specific 
examples. These examples are described solely for purposes of illustration and are 
not intended to limit the scope of the invention. Changes in the form and substitution 
of equivalents are contemplated as circumstances may suggest or render expedient. 
Although specific terms have been employed herein, such terms are intended in a 
descriptive sense and not for purposes of limitations. 
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Example 1. Development o f the Genetic, physical "contip" and transcriptional map 

of the minimal co-segregating region 

The CEPH MegaYAC and the RPCI PAC human total genomic DNA libraries 
were searched for clones containing genomic DNA fragments from the AD3 region 
5 of chromosome 14q24.3 using oligonucleotide probes for each of the ## SSR marker 
loci used in the genetic linkage studies as well as ## additional markers depicted in 
Figure la (Albertsen et aL, 1990; Chumakov et aL, 1992; Ioannu et aL, 1994), The 
genetic map distances between each marker are depicted above the contig, and are 
derived from published data (NIH/CEPH Collaborative Mapping Group, 1992; Wang, 

10 1992; Weissenbach, J et aL, 1992; Gyapay, G et aL, 1994). Clones recovered for 
each of the initial marker loci were arranged into an ordered series of partially 
overlapping clones ("contig") using four independent methods. First, sequences 
representing the ends of the YAC insert were isolated by inverse PGR (Riley et aL, 
1990), and hybridized to Southern blot panels containing restriction digests of DNA 

15 from all of the YAC clones recovered for all of the initial loci in order to identify 
other YAC clones bearing overlapping sequences. Second, inter-Alu PCR was 
performed on each YAC, and the resultant band patterns were compared across the 
pool of recovered YAC clones in order to identify other clones bearing overlapping 
sequences (BeUamne-Chaitelot et aL, 1992; Chumakov et aL, 1992). Third, to 

20 improve the specificity of the Alu-PCR fingerprinting, we restricted the YAC DNA 
with Haem or Rsal, amplified the restriction products with both Alu and L1H 
consensus primers, and resolved the products by polyacrylamide gel electrophoresis. 
Finally, as additional STSs were generated during the search for transcribed 
sequences, these STSs were also used to identify overlaps* The resultant contig was 

25 complete except for a single discontinuity between YAC932C7 bearing D14S53 and 
YAC746B4 containing D14S61. The physical map order of the STSs within the contig 
was largely in accordance with the genetic linkage map for this region (NIH/CEPH 
Collaborative Mapping Group, 1992; Wang, Z, Weber, J.L., 1992; Weissenbach, J 
et al., 1992; Gyapay, G et aL, 1994). However, as with the genetic maps, we were 

30 unable to unambiguously resolve the relative order of the loci within the 
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D14S43/D14S71 cluster and ±tDJ4S76/D14S273 cluster, PAC1 clones suggest that 
D14S277 is telomeric to D14S268, whereas genetic maps have suggested the reverse 
order. Furthermore, a few STS probes failed to detect hybridization patterns in at 
least one YAC clone which, on the basis of the most parsimonious consensus physical 
5 map and from the genetic map, would have been predicted to contain that STS. For 
instance, the D14S268 (AFM265) and RSCAT7 STSs are absent from YAC788H12 
(Figure 3). Because these results were reproducible, and occurred with several 
different STS "markers, these results most likely reflect the presence of small 
interstitial deletions within one of the YAC clones, 

10 

Example 2. Cumulative two-point lod scores for chromosome 14q24,3 markers. 

Genotypes at each polymorphic microsatdlite marker locus were determined 
by PGR from lOOng of genomic DNA of all available affected and unaffected 
pedigree members as previously described (St George-Hyslop, P et aL, 1992) using 

15 primer sequences specific for each microsatellite locus (Weissenbach, J et al, 1992; 
Gyapay, G et al M 1994), The normal population frequency of each allele was 
determined using spouses and other neurologically normal subjects from the same 
ethnic groups, but did not differ significantly from those established for mixed 
Caucasian populations (Weissenbach, J et al, 1992; Gyapay, G et aL, 1994), The 

20 maximum likelihood calculations assumed an age of onset correction, marker allele 
frequencies derived from published varies of mixed Caucasian subjects, and an 
estimated allele frequency for the ADS mutation of 1:1000 as previously described 
(St George-Hyslop, P et al M 1992). The analyses were repeated using equal marker 
allele frequencies, and using phenotype information only from affected pedigree 

25 members as previously described to ensure that inaccuracies in the estimated 
parameters used in the maximum likelihood calculations did not misdirect the analyses 
(St Geoige-Hyslop, P etal,, 1992). These supplemental analyses did not significantly 
alter either the evidence supporting linkage, or the discovery of recombination events. 
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Example 3. ETaplotvpes between flanking marker? fflpepated with AD3 in FAD 

pedigrees 

Extended haplotypes between the centromeric and telomeric flanking markers 
on the parental copy of chromosome 14 segregating with AD3 in fourteen early onset 

5 FAD pedigrees (pedigrees NIH2, MGH1, Torl.l, FAD4, FAD1, MEX1, and FAD2 
show pedigree specific lod scores 2 +3.00 with at least one marker between 
D14S258 and D14S53). Identical partial haplotypes (boxed) are observed in two 
regions of the disease bearing chromosome segregating in several pedigrees of similar 
ethnic origin. In region A, shared alleles are seen atD14S268 ("B": allele size = 126 

10 bp, allele frequency in normal Caucasians « 0.04; "C": size = 124 bp, frequency 
= 0.38); D14S277 ("B": size - 156 bp, frequency = 0.19; "C H : size = 154 bp, 
frequency = 0.33); andRSCAT6 ("D": size = lllbp, frequency 0.25; "E": size = 
109bp, frequency = 0.20; "F H : size - 107 bp, frequency = 0.47). Li region B, 
alleles of identical size are observed at D14S43 ("A": size - 193bp, frequency - 

15 0.01; "D B : size - 187 bp, frequency = 0.12; "E": size - 185 bp, frequency = 
0.26; T: size = 160 bp, frequency - 0.38); D14S273 ("3": size « 193 bp, 
frequency = 0.38; "4" size = 191 bp, frequency = 0.16; "5": size = 189 bp, 
frequency - 0.34; H 6": size = 187 bp, frequency - 0.02) andD14S76 ("1": size 
= bp, frequency = 0.01; "5": size = bp, frequency = 0.38; "6": size - bp, 

20 frequency = 0.07; "9": size = bp, frequency = 0.38). The ethnic origins of each 
pedigree are abbreviated as: Ashk = Ashkenazi Jewish; Ital = Southern Italian; Angl 
= Anglo-Saxon-Celt; FrCan = French Canadian; Jpn « Japanese; Mex = Mexican 
Caucasian; Ger = German; Am = American Caucasian. The type of mutation 
detected is depicted by the amino acid substitution and putative codon number or by 

25 ND where no mutation has been detected because a comprehensive survey has not 
been undertaken due to the absence of a source of mRNA for RT-PCR studies. 

Example 4. Recovery of t ranscribed sequences from the AD3 interval. 

Putative transcribed sequences encoded in the AD3 interval were recovered 
30 using either a direct hybridization method in which short cDNA fragments generated 
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from human brain mRNA were hybridized to immobilized cloned genomic DNA 
fragments (Rommens, JM et al„ 1993). The resultant short putatively transcribed 
sequences were used as probes to recover longer transcripts from human brain cDNA 
libraries (Stratagene, La Ma). The physical location of the original short clone and 

5 of the subsequently acquired longer cDNA clones were established by analysis of the 
hybridization pattern generated by hybridizing the probe to Southern blots containing 
a panel of EcoRI digested total DNA samples isolated from individual YAC clones 
within the contig. The nucleotide sequence of each of the longer cDNA clones was 
determined by automated cycle sequencing (Applied Biosystems Inc., CA), and 

10 compared to other sequences in nucleotide and protein databases using the blast 
algorithm (Altschul, SF et al., 1990). Accession numbers for the transcribed 
sequences in this report are: L40391, L40392, L40393, L40394, L40395, L40396, 
L40397, L40398, L40399, L40400, L40401, L40402, and L40403. 

15 Example 5. L ocating mutations in the ARMP yens using restriction enzvmes. 

The presence of Ala 246 Glu mutation which creates a Ddel restriction site 
was assayed in genomic DNA by PCR using the end labelled primer 849 (5'- 
atctccggcaggcatatct-3') SEQ ID No: 126 and the unlabelled primer 892 (5'- 
tgaaatcacagccaagatgag-3') SEQ ID No: 127 to amplify an 84bp genomic exon fragment 

20 using lOOng of genomic DNA template, 2mM MgCl 2 , 10 pMoles of each primer, 
0.5U Taq polymerase, 250 uM dNTPs for 30 cycles of 95°C X 20 seconds, 60°C X 
20 seconds, 72°C X 5 seconds. The products were incubated with an excess of Ddel 
for 2 hours according to the manufacturers protocol, and the resulting restriction 
fragments were resolved on a 6% nondenaturing polyacrylamide gel and visualized 

25 by autoradiography. The presence of the mutation was inferred from the cleavage of 
the 84bp fragment to due to the presence of a Ddel restriction site. All affected 
members of the FAD1 pedigree (filled symbols) and several at-risk members ("R") 
carried the Ddel site. None of the obligate escapees (those individuals who do not get 
the disease, age > 70years), and none of the normal controls carried the Ddel 

30 mutation. 
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Example 6. Locating mutation in the A RMP gene using allele specific 

oligonucleotides. 

The presence of the Cys 410 Tyr mutation was assayed using allele specific 
oligonucleotides. lOOng of genomic DNA was amplified with the exonic sequence 

5 primer 885 (5'-tggagactggaacacaac-3') SEQ ID No: 128 and the opposing intronic 
sequence primer 893 (5'-gtgtggccagggtagagaact-3') SEQ ID No: 129 using the above 
reaction conditions except 2.5 mM MgCl a , and cycle conditions of 94<>C X 20 
seconds, 58°C X 20 seconds, and 72<>C for 10 seconds). The resultant 216bp genomic 
fragment was denatured by 10-fold dilution in 0.4M NaOH, 25 mM EDTA, and was 

10 vacuum slot*botted to duplicate nylon membranes. The end-labelled "wild type" 
primer 890 (5'-ccatagcctgtttcgtagc-3') Seq ID No: 130 and the end-labelled "mutant" 
primer 891 (5'-ccatagcctAtttcgtagc-3') SEQ ID No: 131 were hybridized to separate 
copies of the slot-blot filters in 5 X SSC, 5 X Denhardt's, 0.5% SDS for 1 hour at 
48°C, and then washed successively in 2 X SSC at 23°C and 2 X SSC, 0.1% SDS 

15 at 50°C and then exposed to X-ray film. All testable affected members as well as 
some at-risk members of the AD3 (shown) and NJH2 pedigrees (not shown) possessed 
the Cys 410 Tyr mutation. Attempts to detect the Cys 410 Tyr mutation by SSCP 
revealed that a common intronic sequence polymorphism migrated with the same 
SSCF pattern. 

20 

&Smplg.T,, Northern hybridization demonstrating the expression of ARMP protein 
mRNA in a variety of tissues. 

Total cytoplasmic SNA was isolated from various tissue samples (including 
heart, brain and different regions of, placenta, lung, liver, skeletal muscle, kidney 

25 and pancreas) obtained from surgical pathology using standard procedures such as 
CsCl purification. The UNA was then electrophoresed on a formaldehyde gel to 
permit size fractionation. The nitrocellulose membrane was prepared and the KNA 
was then transferred onto the membrane. JI P-labelled cDNA probes were prepared 
and added to the membrane in order for hybridization between the probe the RNA to 

30 occur. After washing, the membrane was wrapped in plastic film and placed into 
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imaging cassettes containing X-ray film. The autoradiographs were then allowed to 
develop for one to several days. The positions of the 28S and 18S rRNA bands axe 
indicated. Sizing was established by comparison to standard RNA markers. Analysis 
of the autoradiographs revealed a prominent band at 3.0 kb in size. These northern 
5 blots demonstrated the ARMP gene is expressed in all of the tissues examined. 

Example 8: Eukaryotic and Prokarvotic Expression Vector Systems 

Eukaryotic and prokaryotic expression systems have been generated using two 
different classes of ARMP nucleotide cDNA sequence inserts. In the first class, 

10 termed full-length constructs, the entire ARMP cDNA sequence was inserted into the 
expression plasmid in the correct orientation, and included both the natural 5' UTR 
and 3' UTR sequences as well as the entire open reading frame. The open reading 
frames bear a nucleotide sequence cassette which allows either the wild type open 
reading frame to be included in the expression system or alternatively, single or a 

15 combination of double mutations can be inserted into the open reading frame. This 
was accomplished by removing a restriction fragment from the wild type open reading 
frame using the enzymes Narl and PflmI and replacing it with a similar fragment 
generated by reverse transcriptase PGR and which bears the nucleotide sequence 
encoding either the MetH6Leu mutation or the Hysl63Arg mutation. A second 

20 restriction fragment was removed from the wild type normal nucleotide sequence for 
the open reading frame by cleavage with the enzymes PflmI and Ncol and replaced 
with restriction fragments bearing wither the nucleotide sequence encoding the 
Ala246Glu mutation, or the Ala260Val mutation or the Ala285Val mutation or the 
Leu286Val mutation, or the Leu392Val mutation, or the Cys410Tyr mutation. 

25 Finally, a third variant bearing combinations of either the Metl46Leu or Hisl63Arg 
mutations in tandem with the remaining mutations by linking the Narl-Pflml fragment 
bearing these mutations and the Pflml-Ncol fragments bearing the remaining 
mutations. 

A second variant of cDNA inserts bearing wild type or mutant cDNA 
30 sequences was constructed by removing from the full-length cDNA the 5' UTR and 
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part of the 3 1 UTR sequences. The 5' UTR sequence was replaced with a synthetic 
oligonucleotide containing a Kpnl restriction site and a Kozak initiation site 
(oligonucleotide 969: ggtaccgccaccatgacagaggtacctgcac) SEQ ID No: 138. The 3' 
UTR was replaced with an oligonucleotide corresponding to position 2566 of the 
5 cDNA and bears an artificial EcoRI site (oligonucleotide 
970:gaattcactggctgtagaaaaagac) SEQ ID No: 139, Mutant variants of this construct 
were then made by inserting the same mutant sequences described above at the Narl- 
Pflml fragment, and at the Pslml-Ncol sites described above. 

For eukaryotic expressions, these various cDNA constructs bearing wild type 

10 and mutant sequences were cloned into the expression vector pZeoSV (invitrogen). 
For prokaryotic expression, two constructs were made using the glutathione S- 
transferase fusion vector pGEX-kg. The inserts which have been attached to the GST 
fusion nucleotide sequence are the same nucleotide sequence described above 
generated with the oligonucleotide primers 969, Sequence ID No: 138 and 970, 

15 Sequence ID No:139, bearing either the normal open reading frame nucleotide 
sequence or bearing a combination of single and double mutations as described above. 
This construct allows expression of the full-length proton in mutant and wild type 
variants in prokaryotic cell systems as a GST fusion protein which will allow 
purification of the full-length protein followed by removal of the GST fusion product 

20 by thrombin digestion. The second prokaryotic cDNA construct was generated to 
create a fusion protein with the same vector, and allows the production of the amino 
acid sequence corresponding to the hydrophillic acidic loop domain between TM6 and 
TM7 of the full-length protein, as either a wild type nucleotide sequence (thus a wild 
type amino acid sequence for fusion proteins) or as a mutant sequence bearing either 

25 the Ala285Val mutation, or the Leu286Val mutation, or the Leu392Val mutation. 
This was accomplished by recovering wild type or mutant sequence from appropriate 
sources of UNA using the oligonucleotide primers 989:ggatccggtccacttcgtatgctg SEQ 
ID No:140, and 990:ttttttgaattcttaggctatggttgtgttcca SEQ ID No; 141. This allows 
cloning of the appropriate mutant or wild type nucleotide sequence corresponding to 

30 the hydrophillic acid loop domain at the BamHI and the EcoRI sites within the pGEX- 
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KG vector. 

These prokaryotic expression systems allow the holo-protein or various 
important functional domains of the protein to be recovered as fusion proteins and 
then used for binding studies, structural studies, functional studies, and for the 
generation of appropriate antibodies. 

foamPk 9\ Identification nfThr^ yfr w M otions in the ARMP (tens 

Three novel mutations have been identified in subjects affected with early 
onset Alzheimer's disease. All of these mutations co-segregate with the disease, and 
are absent from at least 200 normal chromosomes. The three mutations are as 
follows: a substitution of C by T at position 1027 which results in the substitution 
of alanine 260 for valine; substitution of C by T at position 1102, which results in the 
substitution of alanine at 285 by valine; and substitution of C by G at position 1422 
which results in the substitution of leucine 392 by valine. Significantly, all of these 
mutations occur within the acidic hydrophillic loop between putative TM6 and TM7, 
Two of the mutations (A260V; A285V) and the L286V mutation are also located in 
the alternative spliced domain. 

The three new mutations, like the other mutations, can be assayed by a variety 
of strategies (direct nucleotide sequencing, Allele specific oligos, ligation polymerase 
chain reaction, SSCP, RFLPs) using RT-PCR products representing the mature 
mRNA/cDNA sequence or genomic DNA. We have chosen allele specific oligos. 
For the A260V and the A285V mutations, genomic DNA carrying the exon can be 
amplified using the same PCR primers and methods as for the L286V mutation. PCR 
products were then denatured and slot blotted to duplicate nylon membranes using the 
slot blot protocol described for the C410T mutation. 

The Ala260Val mutation was scored on these blots by using hybridization with 
end-labeled allele-specific oligonucleotides corresponding to the wild type sequence 
(994:gattagtggttgttttgtg) SEQ ID No; 142 or the mutant sequence 
(995 :gattagtggctgttttgtg) SEQ ID No: 143 by hybridization at 48°C followed by a wash 
at 52°C in 3X SSC buffer containing 0.1% SDS. The Ala285Val mutation was 
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scored on these slot blots as described above but using instead the allele-specific 
oligonucleotides for the wild type sequence ( 1003 : tttttccagctctcattta) SEQ ID No:144 
or the mutant primer (1004:tttttccagttctcattta) SEQ ID No:145 at 48<>C followed by 
washing at 52«C as above except that the wash solution was 2X SSC. 

5 The Leu392Val mutation was scored by amplification of the exon from 

genomic DNA using primers 996(aaacttggattgggagat) SEQ ID No: 147 and 893 
(gtgtggccagggtagagaact) SEQ ID No: 129 using standard PCR buffer conditions 
excepting that the magnesium concentration was 2mM and cycle conditions were 
94°C time 10 seconds, 56°C times 20 seconds, and 72<>C for 10 seconds. The result 

10 200 based pair genomic fragment was denatured as described for the Cys410Tyr 
mutation and slot-blotted in duplicate to nylon membranes. The presence or absence 
of the mutation was then scored by differential hybridization to either a wild type end- 
labelled oligonucleotide (999:tacagtgttctggttggta) SEQ ID No: 146 or with an end- 
labeled mutant primer (100:tacagtgttgtggttggta) SEQ ID No:148 by hybridization at 

15 45°C and then successive washing in 2X SSC at 23°C and then at 68°C. 

Example 10: Polyclonal Antibody Production 

Peptide antigens were synthesized by solid-phase techniques and purified by 
reverse phase high pressure liquid chromatography. Peptides were covalently linked 

20 to keyhole limpet hematoxylin (KLH) via disulfide linkages that were made possible 
by the addition of a cystein residue at the peptide C-terminus, This additional residue 
does not appear normally in the protein sequence and was included only to facilitate 
linkage to the KLH molecule. A total of three rabbits were immunized with peptide- 
KLH complexes for each peptide antigen and were then subsequently given booster 

25 injections at seven day intervals. Antisera were collected for each peptide and pooled 
and IgG precipitated with ammonium sulfate. Antibodies were then affinity purified 
with Sulfo-link agarose (Pierce) coupled with the appropriate peptide. This final 
purification is required to remove non-specific interactions of other antibodies present 
in either the pre- or post- immune serum. 

30 The specific sequences to which we have raised antibodies are; 
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Polyclonal antibody 1: NDNRERQBHNDRRSL (C)- residues 30-45 
Polyclonal antibody 2: KDGQHYTPFTEDTE (Q- residues 109-120 
Polyclonal antibody 3: EAQRRVSKNSKYNAE (C)-residues 304-319 
Polyclonal antibody 4: SHLGPHRSTPESRAA (Q-residues 346-360 
5 The non-native cysteine residue is indicated at the C-terminal by (Q, These 
sequences are contained within various predicted domains of the protein. For 
example, antibodies 1, 3, and 4 are located in potentially functional domains that are 
exposed to the aqueous media and may be involved in binding to other proteins 
critical for the development of the disease phenotype. Antibody 2 corresponds to a 
10 short linking region situated between the predicted first and second transmembrane 
helices. 

Example 11: Identification of two mutations in ES-1 gene 

RT-PCR products corresponding to the E5-1 ORF were generated from RNA 

15 of lymphoblasts or frozen post-mortem brain tissue using oligonucleotide primer pairs 
1021;5'~cagaggatggagagaatac and 1018:5'-ggctccccaaaactgteat (product = 888 bp); 
and 1071 :5'-gccctagtgttcatcaagta and 1022: S'-aaagcgggagccaaagte (product = 826 
bp) by PGR using 250 /*Mol dNTPs, 2.5 mM MgC32, 10 pMol oligonucleotides in 
10 id cycled for 40 cycles of 94°C X 20 seconds, 58<>C X 20 seconds, 72°C X 45 

20 seconds. The PGR products were sequenced by automated cycle sequencing (AM, 
Foster City, A) and the fluorescent chromatograms were scanned for heterozygous 
nucleotide substitutions by direct inspection and by the Factura (ver 1.2,0) and 
Sequence Navigator (ver L0.1M5) software packages (data not shown), 

Asnl41He: The A-+T substitution at nucleotide 787 creates a Bell restriction 

25 site. The exon bearing this mutation was amplified from 100 ng of genomic DNA 
using lOpMol of oligonucleotides 1041: 5*-cattcactgaggacacacc (end-labelled) and 
1042: S'-tgtagagcaccaccaaga (unlabelled), and PCR reaction conditions similar to 
those described below for the Met239VaL 2/il of the PCR product was restricted 
with Bell (NEBL, Beverly, MA) in 10 /xl reaction volume according to the 

30 manufacturers' protocol, and the products were resolved by non-denaturing 
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polyacrylamidc gel electrophoresis. In subjects with wild type sequences, the 114 bp 
PCR product is cleaved into 68 bp and 46 bp fragments. Mutant sequences cause the 
product to be cleaved into 53 bp, 46 bp and 15 bp. 

Met239Val: The A-*G substitution at nucleotide 1080 deletes a Nlam 
restriction site, allowing the presence of the Met 239Val mutation to be detected by 
amplification from 100 ng of genomic DNA using PCR (10 pMol oligonucleotides 
1034: 5'-gcatggtgtgcatccact, 1035; 5'-ggaccactctgggaggta; 0,5UTaqpolymerase, 250 
/xMdNTPS, UCi alpha 32 P-dCTP, 1.5 mMMgCli, 10 fd volume; 30 cycles of 94<>C 
X 30 seconds, 58°C X 20 seconds, 72<>C X 20 seconds) to generate a 110 bp 
product, 2 (d of the PCR reaction were diluted to 10 (d and restricted with 3 U of 
Nlam (NEBL, Beverly, MA) for 3 hours. The restriction products were resolved by 
non-denaturing polyacrylamide gel electrophoresis and visualized by autoradiography. 
Normal subjects show cleavage products of 55, 35, 15 and 6bp, whereas the mutant 
sequence gives fragments of 55, 50 and 6 bp. 

Although preferred embodiments of the invention have been described herein 
in detail, it will be understood by those skilled in the ait that variations may be made 
thereto without departing from the spirit of the invention or the scope of the appended 
claims. 
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GAGGAA75GG AAGCCCAGAG GGACAG7CA7 C7ACGGCG7C A7CGC7C7AC ACCTCAG7CA 

1330 " 1240 1350 1350 1370 USO 

CGAGC7CC7G 7CCAGGAAC7 77CCAGCAG7 ATC27CGC7G G7GAAGACCC AGAGGAAACG 

1390 1400 1410 1420 1430 1440 

GGAG7AAAAC T7GGA77GGG AGATTTCA77 TTC7ACAG7G TTC7GG77GG 7AAAGCC7CA 

t4 S<j 1460 14"0 1430 1430 1500 

GCAACAGCGA G7GGAGAC7G GAACACAACC A7AGCG7GT7 7CG7AGCCAT Ai, .AA77GGT 

1510 1520 * 1530 1540 1550 1550 

TTG7GCCT7A CA77A77AG7 CC77GC-AT7 T7CAAGAAAG CATTGCCAGC TCTTCGAAtC 

1S70 1330 1330 1500 1510 1=20 

7CCATCACC7 T7GGGC77G7 T77C7AC777 GCCACAGAT7 ATC77G7ACA GC ATS 

1630 1540 1550 1550 1570 1530 

GACCAACTAG CA7TCCA7CA AT7T7A7AT7 TAGCATATTT GCGG77ACAA TCCCA7GGAT 

1$9C 1700 1710 1720 1730 1740 

G7TTC77C77 TGAC7A7AAC CAAATC7GGG GAGGACAAAG GTGA7T7TCC TC7G7CCACA 



17SG 1750 1770 1730 1730 13GO 

TC7AACAAAG TCAAGAT7CG CSGCTGGAC7 T77GCAGC77 CC77CCAAG7 C7TCC7C-ACC 

1310 1320 1330 1340 135 0 1350 

ACCT7GCACT A77GGAC777 GGAAGGAGG7 GCC7ATAGAA AACGAT777G AACATACTTC 

1370 1380 1330 1500 1510 1320 

ATCGCAGTGG AC7G7G7CC7 CGGTGCAGAA AC7ACCAGAT TTGAGGGACG AGGTCAAGGA 

1330 1340 1330 1350 1370 1330 

GATATCATAG GCCCGGAAGT TGCXG7GCCC CATCAGCAGC T7CACGCG7G G7CACAGGAC 

1330 200O 2010 2020 2030 2040 

GAT7TCAC7G ACAC7GCGAA CTCTCACCAC TACCGGTTAC CAAGAGG7TA CG7GAAG7GC 

20S0 2050 2070 2030 2030 2100 

TTTAAACCAA AC GGAAC7CT TCATC77AAA C7ACACG77G AAAATCAACC CAA7AAT7C7 

211C 2120 2130 2140 2150 2150 

G7A77AAC7G AAT7CTGAAC T7TTCAGGAG C7AC7G7GAG GAAGACCAGG CACCAGCAGC 



2. 
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2I7C 213C ;:?0 2-00 2210 2220 

ACAA7GCCGA A7GGAGAGC7 GGCCA0JGG7 7CCAGC77 ZZ C777GAT777 77GCTCCAGA 

2230 2240 2250 22SJ 2270 2230 

C7CA7CC7TT T7AAA7GACA C77G77T7CC CC7C7C777G AG7CAAG7CA AA7A 7 "AC A 

2230 2300 2310 2320 2230 2340 

TGCC777GCC AA7TC77C77 C7CAAGCAC7 GACAC7CA7T ACCS7C7G7G AT7GCCA7TT 

21HG 2353 2373 2330 23*0 2400 

CTTCCCAAGC CCAG7C7GAA CCTMGC77a C777A7CC7A AAAG7TTTAA CC7CAGGT7C 

2*;0 . 2425 2430 2440 2*50 2460 

CAAA7TCAG7 AAA7777GGA AACASTACAG C7A77TC7CA 7CAA7TCTC7 A7CA7GT7GA 

2470 2430 2430 2S0Q 2510 2520 

AG7CAAA77T 0GA77T7CCA CCAAA77C7G AA77T57AGA CA7AC77G7A CSC7CAC77S 

2520 2S40 2530 2550 2570 2530 

CCCCAGA7GC CTCC7C7G7C C7CA77C77C 7C7CCCACAC AAGCAGTCT7 T77C7ACAGC 

2590 2500 26L0 2520 2630 2 = 40 

CAG7AAGGCA GC7C7C7CS7 GGTAGCAGAT GG7CCCAC77 A77C7AGGG7 CT7ACTC7TT 

2650 2650 2570 2530 2630 2700 

G7ATSA7GAA AACAA7G7G7 TATGAA7CGG 7GC7G7CAGC CC7GC7G7CA GACCT7C77C 

2710 2720 2730 2740 2750 2750 

CACAGCAAA7 GACA7G7ATG CCCAAAGCGG TAGAA77AAA CAAGAG7AAA A7CCC7G77S 



2770 2730 27*0 2900 2910 2320 
AAGCAAAAAA. AAAAAAAAAA AAAAAAAAAA A 



MTSL?APLSTeQHAC^2DSEI5HT71ffiQHDH3iraQEHHDHI^IfiH?2?IfiKG3PQGHSRQ7VZQ 
DE2Z3EZI.TIiCrGA:<r7yTMI^^VT^^ 

I-rvx^r/ratXTrrc^SIHWCGPLR^ 

VXCPXGPti^VrrAQElWETIrfPAIXXSSTXVWLVV^^ 

DTVAZNDCGGFSEZ»&AQRDSHI£PH3£TPESIl\AVQ£I^SSI 

VL7raOSAT»S60WrrTIACFVAIIJGLCL^^ 

PMBQtAPHQFYI 



10 2a 30 * 4a so so 

ACCAriACAT^C CCCAww GC2CAAACCT AGCCTGCSA2 CCKCCwCCX CCCC5wuA£ 

7C SO 30 100 110 12C 

fcCACAAKAA C3ACACAAG ACACCAGCCC TTC^CC^r ICCACGAGAA 

140 150 ioo 170 iso 
cacatsagas aaasutccc AACACcrrrr o : .: 1 ^,: 1 :^ ACAAGsrArr ictctctagc 

I iso 200 210 220 220 240 

TrrcaAra acacacazac ctscacctt: <ncrrAcrrc c\gaasxc: Afia asc r j *. 

2£Q 260 270 230 230 300 

dcacixac TCcaccAccs ccasccss*; czagaasgac ackxaacaac gsc^gc^xa 

| 310 320 320 340 350 3a0 

xatgscags CAGAfficrrs ACAAcrraa (staatasc: aatc sgc ssc ccc^cactaa 

i 

| 370 3S0 350 4G3 410 420 

c^uacw cr^ r s aAc aagacgagga gsaagacsaa <3A3cr»^r tsaaatatcs 

| 430 440 450 450 473 4S0 

aGCCAACcas ctcatcascc TcrrrGTCcr csrcAccrrc tccatc^t: rereads: 

i 

I 430 500 510 520 510 540 

c^ccatcua -rcAdCAccr rcr^acrrs gmcczcztt cagcoatc: ACAcrcATT 

1 

I 550 S$Q 570 . 550 530 6CQ 

GCACAAGAC ACTGACACTt: TAGOCCAAAG ACCTCTCCAC TC3A2CCTGA AXCC33CCAT 

( 510 €20 530 $40 £50 ScO 

cvrsAracr ctxaxtsxca ttxtcaccxt ccrcrcsru (Trccrraa aasacakis 

I 670 530 530 700 710 720 

CTACAAffiTrC A1CCACSG27 GSCTtAJtM? TTCAXTCTG TTGTTCCwT TCrrtTTTTC 

I 720 740 750 760 770 750 

getcvttzac rai«3GAAG tacttaagac cacvscxc yc c sr cs Acg acjtwIcact 

730 800 310 820 830 843 

ACCACTCCTA ATCTSCAXXT GGCSTGTCCr CSMVSSVrT GCCAZCCACT CCAAACSCCC 

850 860 870 580 850 500 

rrCGACTS CACCM3C27 ATCTCATTA7 GA236T3CC CPC3MQC0C TCS7AT7TAT 



T 



520 520 530 540 550 560 

CjUGTACCTC C CC I A AJI55A CmTSSC? CAICT23GC? CTGATTJCAG TA£AIGACTT 

570 580 530 1000 1010 1020 

IGGCTGTT TOOTTCCCi AAGGCCCACT V CUTA a a^; <3TO3AAACAG C2CAC3AAAG 



cLrc 

I 



1030 1040 ioso ioea ao?o icso 

A^KKMACT CrCCTXAG C2CTIA7C7A TXC2CAACA. ATCSTCTSST TGGTSAATA? 

1050 U00 1110 1120 1130 1140 

CGCtGAACOA OCCCAAAJ3 CTCAAACCAG GaTACCCAAG fcACCCCAAOT A13ACVCACA 

| 1150 1160 1170 USO 1130 1200 

AAGAGCCGAG ACACACACAC AGGACACTC5 7XTCCSAAC CA75AZC3IC CCTTOCrSA 

1210 1220 1330 1240 1250 1260 

chACTCCCAC CC2CAAACAS ACAGTCACCT OGGGCCCOC CCCXCACTC CCSAfltCAAC 



; .;o;8o 



1270 UH: 1250 LJOQ 1310 13 1 C 

accrxrsrc cagsucttt ctsscaccat ictwcmt gaacacccs; aggaaagagg 

1330 1240 1350 ' 12SG 1370 133C 

agtaaaact? ggactsgcac Arrrtsrrrr cncAu;^r ctcsttsgea aggccxagc 

: : 1330 1400 1410 1420 1430 1440 

AACCGCCACTT CCAGACXCA ACACAACCA? ACCCTGCTnC GSUSCZXRC 

| 1450 1460 1470 1430 14S0 1500 

qSgCC T S UR T^CICCTCC TCXCXTXa tSSCCAGCCS SCCCCXXCIC 

i 

J 1510 - 1550 f 1530 1540 1550 1550 

CAczAcrrrc GGsrrcorcjr tcajcstcsc cacsansc c rr -nscA se ccnc\rs» 

| 1S70 1530 1530 l£Ca 1*10 132C 

ccaAcrrxA rrcivr^acr mACArctA (jczjr^iu- acttacaaca rasv .v:v:c 

I 1*30 1540 1530 lf*0 1S73 1510 

rarrixa cr accaaaaaca caaaaacaga gagcaasxc <sggaggaga c:wi« crr 

1S90 1700 1710- 1710 1730 1740 

i c: ? s t 5 KC tacsraAca aaggcaggac tcacctcga cttctccagc rrcrrrczsA 

1750 1750 1770 1730 1790 1300 

arcrccczAG cacrrxAc tactsgacts tcsaAasua csrcaaa gsucottt 

1410 1320 1330 1840 1350 13€0 

eruaiccA Tcscrmcc Ma cxsuic crrsaaraac ctsusg^a aggacaagsa 

J 1870 138Q 1350 1200, 1510 1330 

aatctcctgg gccaaggagc jscsax r c tccsccrrr <ssrcncGG actxaar? 

1530 1340 1550 1560 1370 1380 
2ACCC3CAC . 



10 Z0 30 40 SO SO 

^:f^Z?A?L2T K5S£C3SC< 3SC£3Q«3n SCHICK?! SKCSUCSNS* 

L 73 t 30 30 IZQ 1-0 120 

qrvayT^o znnjKAs hvtct/tvt lcst/wat: jtsvarrsoo scLrrrsr^ 

! 130 140 130 150 170 130 

nrsrvseaaii esusaasix svrrMmzi wiraoz vthaktiss uiisttstz 

\ 150 20Q 110 220 230 240 

«sr/7KraT vxrarvrvai u«mgvvqi cesranaLa lqqaslscs auoivto 
t 

j 250 2$0 270 290 2S0 300 

isstowlil AvxsTOtvx vrescjiaH LTmcsas tttvalits3 rawcawas 

I 310 32C 320 340 3S0 360 

xsrasrr^aA. rer$r$c3C Jewess* zaotrsnis? hhstjsssaa 

| 37C 380 390 ' 4CQ " 410 420 

vcz^sgszl? sssssacvx LC^scTirys vlvcmaw gcjw M ri' ia c xvailiglcl 

I 420 440 420 460 470 420 
ttLTJAjroOC «7AX3I3=T ttJ<V*A.-.« TL^THC^ AJSCHZ* 
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1 * 

-J ' J 1-7 20 33 40 50 €0 
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3 Jy 



C»3AGGGAAA7 GO ivi »7 - -jw . CGAAGACG7C .^-AG\j\^\-*_t^.A Gu.^/*-— . 



? 70 30 30 ;:o ;:o 120 

3 GCCGGG ATT A GTAGCCGTCT CAAC7GGAG7 CCAGTAGGAG AAAGAGGAAG CGTC77GGGC 

1 * 

tj i.*!0 140 LS-2 ISO 170 130 

*<* TCCCTCTCCT 7CACCAAC7G G73AAACTCC CCGCC7CACC CCCCSGG7GT G7CCTT37CC 

ISO 2C0 210 220 230 240 

AGGGGCGAC5 AGCATTCTOG GCGAAG7CCG CAC3CC7C77 GT7CGAGGCG GAAGACGGGG 

250 . 250 270 230 290 200 

TCTGA75CTT 7CTCC77GC7 CGGGKC7GTC TCSAGGCA7G CATC7CCAGT GAC7C77GTG 

310 320 330 340 350 3*0 

TTTGCTGCTG C77CCCTCTC AGATTCT7C7 CACCG7TG7G G7CAGC7CTG C7T7AGGCA? 

370 330 350 400 410 420 

TA77AATCCA TAGTCGAGGC TGGGA7GGGT CACAGAAT7G AGGTGACT77 7CCATAATTC 

420 440 450 -150 470 430 

AGACC7AA7C 7GGGAGCC7G CAAC7CACAA CACCC7TTCC GG7CC77AGA CAGCTTC 



430 500 5;a 520 530 540 

TGGAGGAGAA CACA7GAAAG AAAGAACCTG AAGACGC7TT GTTTTCTGTG AAACAGTATT 

550 5o0 570 530 5?0 SCO 

TC7A7ACAG7 TGC7CCAA7G ACAGAG7TAC C70CACCG77 GTCC7AC7TC CACAA7GCAC 

I • M SIC $20 530 840 €50 SqO 

| pi AGATGTCTGA CCACAACCAC CTCACCAATA CT^ATGACAA TAGAGAACSG CAGGAGCACA 

§ .7.;. «70 €30 S30 TOO 710 720 

ACGACAGACG GAGCG77CGC CACCC7SAGC CA77A7C7AA TGGACGACCC CAGGGTAAC7 

730 740 750 7*3 770 730 

CCCGGCAGG7 GGTCGAGCAA GATCAGGAAG AAGATGAGGA GCTGACAT7G AAATATGGCG 

790 800 810 820 330 340 

CCAAGCATGT GA7CATGCTC TTTCTCCCTG TGACTCTCTG CA7GGTGGTG GTCGTGGCTA 

850 SSO 870 880 890 300 

CCATTAACTC AGTCAGCTTT TATACCCGCA ACGATGGGCA GC7AATCTAT ACCCCAT7CA 

910 920 930 940 950 360 

CACAAGA7AC CGAGACTGTG GGCCAGAGAG CCCTGCACTC AATTCTGAAT CCTGCCATCA 

970 <?90 990 1000 1010 1020 

TGA7CAC-7G7 CAT7CTTG7C ATGAC7A7CC TCC7CG7GG7 TC7GTA7AAA 7ACACGTGC7 

1030 1040 1050 1050 1070 1080 

ATAACGTCAT CCATCCC7GC CTTATTATAT CATCTCTATT GTTGCTGTTC TTTTTTTCAT 



use iito ::hc use :20c 

cac:gc7ga7 ctggaattts gg7GT^t:« gaatca:— c cat: cactcc aaaggtccac 

t 2 -q 1330 1^33 1240 1250 1250 

77CGAC7CCA CCACCCA7A7 CTCATTATOA 77AG7GC2C7 CA-Tt:CCCC7G G7GT77A7CA 

1270 1230 , 1250 1200 1310 1320 

AG7ACC7CCC 7GAA7GGACT CCG7GCC7CA TC77GCC7GT GA7T7CAG7A 7ATC-AT7TAG 

U30 1340 1350 13*5 1270 1230 

TC-C-C7G7TT7 CTGTCCGAAA GGTCCACTTC C-7A7GC7GG7 TCAAACAGC7 CACCAGAGAA 

US0 1*00 1413 1420 1430 1440 

A7GAAACGC7 T777CCAGC7 C7CA777AC7 C27CAACAA7 GG7G7GG77G G7GAA7A7C-G 

14 50 UaO 14^0 1430 1430 1500 

CAGAAGGAGA CGCCGAACC7 CAAAGGACAG TA772AAAAA 77CCAAG7A7 AA7GCAGAAA 

l£3Q 1540 1550 1550 

GCACACAAAG GGAG7CACAA GACAC7077G CAC-AGAA7GA 7CA70GCGGG 77CAG7GAGG 

lS 70 1550 1550 * 1*00 1510 1520 

AA7GGGAAGC CCAGACGCAC AGTCATC7AG C-GC27CA7CC C7C7ACACC7 GAG7CACGAG 

1540 1550 1550 1570 1530 

AGCAG7A7CC TCGC7GG7GA AGACCGAGAG GAAAGGGC-AG 

1550 170Q 1710 1720 173Q 1740 

TAAAACT7GG A77CCGAGA7 T7CA7TT7C2 ACAC7G77CT GG77GG7AAA CC^-AGCAA 

:7 = 0 1750 1770 1730 1790 J300 

CAGCCAG7GG AGAC7CGAAC ACAACCA7AG CCTGTTTCGT AGCCATATTA A7TGG . *7C7 

1310 1320 ia30 1340 1350 1850 

CCCTTACAT7 ATTACTCCTT GCGA7T7TCA AGAAACCA77 GCCAGC7C77 CCAA7C7CCA 

1370 1330 1390 1300 1910 1320 

7CACC77TGG GC77G7CT7C TACTTTGCCA OGA77A7C7 TGTACACCCT T77ATGGACC 

l23 0 1940 1950 1550 1970 1330 

l^-^OC^ CCATCAATTT TA7A7C7AGC A7A777GCGG T7AGAA7CCC ATSGArGTTT 

1390 2000 2010 2020 2030 2040 

C77C777CAC 7A7AACCAAA TCTCOGGAGG ACAAAGG7GA TT77CC7S7G TCCACA7CTA 

2fl<?a 2050 2070 2090 2030 2100 

ACAAAG7CAA CA77CCCCCC 7GGAC7T7TG CAGC7TCC77 CCAAG7C77C CTGACCAG27 

2110 2120 . 2130 2140 2150 2150 

TCCAC7A77G GAC7T7GGAA CCAGC7GCCT A7AGAAAACG A7777SAACA TACTTCA7CG 



0 



2 ti= 



.! 



2i~ ] :*-jo 2::;o zzzz 

C AGTCCACT 3 TGTCCTCGGT CCAGAAACTA CCAGArTTGA GGCACGAGGT CAA G C A G ATA 

2240 2250 Z2zl 2Z7C 22SO 

T^ATAGGCCC GCAAGTTGCT GTGCCCCATC AGCAGCTTGA CGCGTGGTCA CAO»GACGATT 

2290 2300 1310 2320 2330 2340 

T C ACT G AC AC TCCCAACTCT CAGGACTACC CG TT AC CAAG AGGTTAGGTG AAGTGGTTTA 

23:0 2Z60 2J70 2380 2320 2400 

AACCAAACGG AACTCTTCAT CT7AAAC7AC ACGTTGAAAA TCAACCCAAT AATTCTGTAT 

2410 ' 2420 243S 2440 2420 2450 

TAACTGAATT CTGAACTTTT CAGGAGGTAC TGT5AGGAAG AC-CAGGCACC AGCAGCAGAA 

2470 2430 245C 2500 2510 2520 

TGGGG AA T C G AGAGG T GGGC ACCCC 4 » CCA GCT7CCCTTT GATTTTTTGC TGCAGACTCA 

2530 2540 2S30 2550 2S7Q 2530 

TCCTTTTTAA ATCLAGACTTG TTT7CCCCTC TCTTTGAGTC AAGTCAAATA TG7AGATGCC 

2500 2510 2520 2530 2540 

:tca agcactgaca ctcattaccg tctgtgattg ccArrrcrrc 

2550 2550 2570 2530 2530 2700 

CCAAGGCCAG TCTOAACCTG AGGTTGCTTT ATCCTAAAAG TTTTAACCTC ACG7TCCAAA 

2710 2720 2730 2740 2750 2750 

TCAGTAAAT TTTGGAAACA GTACAGCTAT TTCTCATCAA TTCTCTATCA TGTTGAAGTC 

2770 27S0 2790 2300 2310 2320 



AAATTTCGA? TTTCCACCAA ATTCTGAATT TGTAGACATA CTTGTACGC? CACTTGCCCC 

A 2330 2S40 2850 2350 2370 2330 

I it AGATGCCTCC TCTCTCC7CA TTCTTCTCTC CCACACAAGC agtctttttc tacagccagt 

j ^ 2390 2500 2310 2920 2930 2940 

AAGC-CAGCTC TGTCGTSGTA CCACATGGTC CCACTTATTC TAGGGTCTTA CTCTTTGTAT 



2950 2950 2970 2380 2990 3000 

GATGAAAAGA ATGTGTTATG AA7CGGTGCT GTCAGCCCTG CTGTCAGACC TTCTTCCACA 

3010 3020 3030 3040 3050 3050 

GCAAATOAGA TGTATCCCCA AAGCGGTACA ATTAAACAAG AGTAAAATGG CTGTTGAAGC 

3070 3050 3090 3100 3110 3120 
AAAAAAAAAA AAAAAAAAAA AAAAAAA 
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10 10 aa 4a so 60 

GcrrrrcofAA ccaacttagg aghttccac:; tccctuagac cnacstcatc rccscGAcs;* 

70 SO 30 100 UO 120 

aaagacttica gttgagccst cattgcaccc ac— tactcc aagcctcgcc aaccaaaazs 

130 140 ISO 160 170 130 

AGACACXCGC XCCAAACACA AAAACAAAAA CAAAAAAACA CTAAATTAA7 T1ANAGSGAA 

190 2CG 210 220 230 240 

CNATTAAArA AATAATAGCA CAGTTSAtAr AGCTTATM? AAAAXTAXAA AGGT5CCAJIA 

ISC * 260 2T0 290 290 3C0 

rrAArATCTA ATcrrrrsccA gccatcacax tattctaaa? aatcttttcg tggaaatta* 

310 220 320 340 350 360 

rcTACAXcrr ttaaaaxctg t^taac-titt tttcagggaa gtgtttaaaa ccrArAAcsr 

370 380 390 400 410 420 

TCCrCTCCAC TACATTACTS TTNCACTCCT GAXCTSGAAT rrTCGTGTGG TCGGAArOA? 

430 440 450 460 470 480 

TTCCATTCAC TCGAAACCTC CACTTC5AC7 CCACCACGCA TA-CTCATTA TOATTAGtCC 

49 0 500 510 520 530 540 

cctcazctcc crGxxorrrA tcaagtacct ccctoaatw acts:«tcgc tcAxcrrws 

530 560 570 Sao 590 600 

TCTCATTTCA GTAXATGGTA AAAC==AAGA CrCATAATTT GTTTSTCACA GGAAXSCCCC 

610 620 630 640 630 660 

ACTCCACTCX IVIllliUi: CATCXCT7TA TCXTSATTTA GACAAAATW TAAC5TSTAC 

«70 630 630 700 710 720 

AxcccAtaac tcttcagtaa atcattaaxt agctatajgta AcrrrrrcAT xrt:AACArrr 

730 740 750 760 770 7S0 

CCGCTCCCCA TCSTAGCTCA TGCCTSTAAt CTtACCACTT TWSACGCTC ACCCSCSCAC 

790 aoa aia 820 «3o $40 

ATCACCTAAG CCCAGAGTTC AAGACCACCC TCCGCAACAX CCCAAAACCT CSTAJCTACA 

3SO 360 870 880 890 900 

CAAAATACAA AAAXTAGCC^ GCCATGGTGG TCCACACC7S TAGTTCCAGC TACTTAGGAG 

910 920 930 940 950 560 
GCTGACGTCC GAGCAXCGAT TGATCCCAGS AGCTCAAGHC TGCAG 



0S6 
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10 20 30 40 SO 60 

CTT GCAAACT CAZSGArtCC TTTACCTACC TACA7TAXCA ACCrtrtTCA GAATAAAATG 

7C 80 30 100 110 120 

AATTGAGAGT GT2 ACAGTC7 AATTC7ATAT CACArSTAAC TTTTArTTCG ACATATCAG7 

X30 140 ISO 160 170 130 

AArACTocrr r«n.-«»*»*» r;^: r: r r? i^^^^iii^r tttccccaka cagtctcsct 

!50 200 210 220 220 240 

C70TC5CCAG GTTCCAC2CC ftATCGTGCGA TCXTGGCTCA CXOAAAGCTC CACC3CCC5S 

250 260 270 280 250 300 

GTTCAACTSA TTCTCCTGCC TCAGCCIICCC AACTAGHTGG CACTACACGG GXGCGCCACC 

310 320 330 340 3S0 360 

ACCCCTCCCA TAAXrrrCCC M - 1 Z 2AGTA CAGATWCGT TTCACCaacr TGGJJGCACCC 

370 330 390 400 410 420 

TGGTCTTCSA ACTCCTSAHA TCATSArCTS CCTCCCTTAG CC7CCCCAAA GTGCTGGGAr 

430 44C 450 460 470 4a0 
VXCXGSZZTC ACeCACTSTT CCTCCCCCtC 





10 20 30 40 SO fiO 

cctcaxcatg cttcacccog gaccctctsc ccgaacaacg ctcccacaca cnaxaaacaa 

70 80 90 100 110 120 

7GC2CCCGCA CAGGATAGAG AATCCCCCCS CACAGCA7AG AGAAGCCCCC GCACAGCA7A 

130 140 150 ISO 170 ISO 

CACAATCCCC CCTCACAGCA TACAGAACCC CCCGCACACC AXACACAATO CXCrTCACCT 

190 200 f 210 220 230 240 

crccoTrrrr aaccagccaa actaaaatca cacacosoia cacatcattt aacatagaaa 

250 250 270 230 290 300 

rrrcrsTATC rrrTAAirrr tttctaagta GTTTTAcrTA rrrrcAGArr crAizrcrrr 

310 320 330 340 350 360 

ACTAGAAXTA ACCGAXAAAA TAACAA~T3 TCCATAATOA ACCCTATGAA AOlAACXKAA 

370 380 350 400 410 420 

CCTAGGTTTT TTTCATACST CTTCTTCCAG ATTGAArCAA CCTCTGTTCT AAAATTTAAC 

430 440 450 460 470 430 

CCCrCACCGA AAT^XTCAGT TAA CTAXG7T AAAAACCCAG ACrtOTOAtr GAGTTrTGCC 

490 500 510 520 530 540 
TCAAAATCCT rrCArAATTA TCCCTOAA™ TGTSTC . ♦ . , 
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10 20 30 40 SO 60 

MAtccrrcc o ji ' i it^cx ccaxacaagg iaacxxccgg acgxxgcxax ggoxcxcxa 

70 ac ?o loo no na 

xkczzzcxzg gxgxxggcgg ggagxcxctt txagcatccx aaxgxaxxax aaxxaccgxa 

130 140 ISO ISO 170 130 

XAGXGAGCAG XGAGGAXAAC CAGAGGXCAC TCXCCTCACC AXCTXSGXXX XGGXGCCXXT 

190 200 < 210 220 220 240 

XCGCCACCXX CXTXAXXSCA ACCAGXTXXA TCACCAACAX CXTXAXCACC XGXAXCXXGX 

250 250 270 250 230 300 

GCtCACTrcc r^CTCAXCC cs:TAACXAAG AGXACCTAAC CTCCTCCAAA TXGHAGHCCA 

3X0 320 330 340 350 350 

ccjacgxcttg cscxxArra acccagcccc taxtcaasax agagxsgxxc xxcghccaaa 

370 350 320 4C0 410 420 

CSCCfCXCAC ACAAGGAXXX TAAAGTCTTA T7AA27AAGG XAAGAXAGXX CCXXSSAXAX 

430 440 450 460 470 430 

GXGGXCTCAA AXCACAGAAA GCXCAAXXX5 SAAAAAGGXG CTXGGASCXG CACCCAGXAA 

450 SCO 510 520 330 540 

acaagxxxxc axscagcxgx cagxaxxxaa cgxacaxcxc aaaggaxaag xacaaxxcxc 

5« a ScO S70 580 550 600 

TAWJW -£ A rSAACACAGA GAAXGGAGCA AffCCAAGACC CAGGXAAAAG AGAGGACCXG 

<ia 620 630 640 650 660 

AAXGCCXXCA GXGAACAAXG AXAGAXAAXC XAGACXXXXA AACXGCAXAC XXCCX5XACA 

S 70 630 630 700 710 72C 

rrGTrTrrrc ^^^—--^c^ XXXX7AGAAC TCAXAGXOAC GCGXCXCXXG XXAAXCCCAG 

730 740 750 760 770 730 

GXCXAACCGX XACCXXSAXX CXGCXGAGAA tCXSAXXTAC XCAAAAXGXT XXXCTXGXGC 

790 800 310 820 830 640 

XXAXAGAAXG ACAAXAGAGA ACSGCAGGAG CACAACGACA CACGGAGCCX XGGCCACCCX 

a<0 360 870 880 890 900 

CAKCCAXXAX CXAAXCCACG ACCCAGGGXA ACXCCCGGCA GGXCGXSGAX CAAGAXGAGG 

n0 920 930 940 950 960 

AAGAAGAXGA GCASCXGACA TXGAAATAXG HCGSCAACCA TGXGAXCAXG CXCXXXGXCC 

970 980 990 1000 1010 1020 

CTGXCACXCX CXCCAXSGXG CXCCXCGXGG HTACCAXXAA GXCAGXCAGC XXXXAXACCC 

1C30 1040 1050 1060 1070 1080 

GGAAGGAXGG GCAGCXGXAC GTAXGAGXXX XCXXXTAIXA IXCXCAAASC CAGXGXGGCX 

10*0 1100 1110 1120 1130 1140 

TTTCTXXACA GCAXGXCAXC AXCACCXXGA AGCCCXCX3C AXXGAAGGGG CATGACTTAG 

1150 1160 1170 1130 1190 1200 

CTGCACACCC CAXCCXCTCX CAXGGXCACG AGCACXXGAG AGAKCCAGGG CXTAXXACXX 

1210 1220 ' ' 1230 1240 1250 1260 



089 



U GT3CACAAAA CCAACACTCC ACJUGTATGT XZCZttZlZ SZkTTZCZCO 

1570 1230 1290 I3C0 1310 1320 

A7ACCCC7GA AG~ArGC: G AAXTGAACAC ATAAATTCTT TTCCAC~CA CCGtfCArTGG 

1330 1340 13S0 1350 1370 1330 

cccc-ca—g trrcrrcrccc tagaacactc rrrc c rrr r rc rrAcrrsGGs ggatiaaatt 

1330 1400 14.10 1420 143C 1440 

cctctcaxcc cccrcrrcrr ccrsTTAtAr ATAAAcnrrr ggtgccscaa aagaagtacc 

1430 14SC * 1470 14SC 1490 I50G 

ACrCGAAT.YT AAAATTTrCC T2T7AAC7C2 CACCAAGGHA ACTTACrTCf A2ATAGAACC 

1510 1520 1330 1340 1330 1SS0 

cxG«ccarr acagatggaa c\ArsGCAAG cgcacatttg gcacaagcga cgcgaaaggg 

1370 1530 1S50 1600 lolC 1630 

rrcrrvrccc tgacacacgt Gcrcccirccr G?rr<r:c; •»« trcccrcACTG aktagggtta 

1630 1640 1530 1660 1S70 1530 

GAC7GGACAG GCTTAAACTA ATTCCAA~G GJCAATTTAA AGAGAAIJIA7 GGGG75AAX5 

1650 I7C0 1710 1720 173Q 1740 
crrrrcGAcs agtcaaggaa gag:iagctag maggtaactt gaatga . . 



Xa 20 30 40 SO 60 

GICGTAXAAA ACACCAACAX XCCCAtfOTAC AACCACAGCC AACAXCXXCX CCTACCXXCC 

70 aa 50 ico no no 

CCaNCGXGX AAXACCAAGX AXXCZiCCAAX TXOTGAXAAA CXTXCAXXSC AAAGXSACCA 
130 140 ' ISO 160 170 130 

cccxccxxss txaaxacaxt gxcxstgcxx cctxxcacac xacagtagca cagtxsagxg 

15 C 200 210 220 230 240 

XXXCCCCXGS ACACCAXAXS ACCCAXAGAG CXXAAAAXAX XCAGXCXSGC XXXXXACAGA 

2=0 260 270 250 290 300 

CAXGXXXCX5 ACXTXSXTAA XACAAAAXCA ACCCAACXGG XXXAAAXAAX GCACATACTX 

ii 0 2Z0 330 340 330 360 

xcxcxcxciix Agagtag— c agaggtag:*: agxccagaxt agxasggxgg cxxcacg^c 

270 330 390 400 410 420 

AXC3AGGAC XCAAXCXCCX XCXTXCXXCX XXAGC7XCXA ACCXCXAGCX XACXXCAGCC 

4*0 440 450 460 470 460 

xccagcctgg agcccxascc xtcaxtxcxs acagxacgaa gcagxagccg agaaaagaac 

490 SCO 510 520 530 540 

AXAGGACAXG XCACCAGAAX TCTCXCCXXA CAAGXTCCAX ACACAACACA XCXCCXXAGA 

550 550 570 530 530 600 

AGXCAXXSCC CXXACXXGXT CTCAXAGCCA XCCXAAAXAX AAGGGAGXCA GAAGTAAACT 

620 630 640 650 660 

cx2aarrGGcr ccsaaxatxs gcaccxggaa xaaaaaxgxx xxtctgxgaa tgagaaacaa 

670 630 690 7C0 710 720 

GGGCAACAXG GAXAXCXGAC AXXAXCTXAA CACAACXCCA GTTGCAAX7A CXCXSCAGAX 

730 740 750 760 770 780 

GACAGGCACT AAXTAXAAGC CAXAXXACCX XXCXXCXGAC AACCACXTGT CAGCCCtfCGT 

730 800 810 820 830 840 

CCXTTCXGXG CCACAAXCXG GTTCTAXAHC AAGXXCCXAA TAAHCXCXA3 CCSAAAAAAX 

8S0 860 870 880 890 900 

XTCATGAGSX ATXAXAAXXA TXXCAAXAXA AACCACCCAC TASAXGCACC CAGTGXCXGC 

910 920 920 940 950 960 

XXCACAXGTX AACXCCTTCX TTCCAZAXGX TAGACATXTX CTXXGAACCA AXTXXAGAGX 

970 930 990 1QQO 1010 1020 

GXACCXGXXX TXCXCAGGXT AAAAAXXC7X ACCXACCAXX CCXCAGXXCC GGAAAAGXGA 

1(320 1040 1050 106G 1070 1030 

CXXAXAAGAX KCCAAXXGAA XXAAGAAAAA GAAAAXXCXG XGXXCGACCX GG7AAXGXCC 

1030 noa * 11X0 • 1120 11 JO 1140 

KTGGXCAXCX XCAXXAACAC TGAtfCXACCG CTXTXGXGXX TGATXXAIXG XAGAAXCXAX 

11=0 1150 1170 1130 1190 1200 

ACCCCAXXCA HAGAAGAXAC CGAGACXGXG CGCCAGAGAC CCCXCCACTC AAXXCXCAAT 

U10 122C 1220 1240 1250 1250 

091 



cctcccat- \ TCAccAGtfcr cArrsTwrc axcactaotc rccrccrccT tc^ctaxaaa 

U70 1230 1290 UCC 1220 1220 

TACXC^rSCT ASaACCTCAC CATOACACAC ACATCTT7G3 tTTCCACCCT GTTCT2CT2A 

1230 1240 1250 1360 1270 1230 

TccrrcwrA rrcrrtrrcAc actaacttaa crsajcaw aaagaaaaaa tg 2:,w ?ct 

1290 1400 1410 1420 1420 1440 

tctaoacata AczcAATrrr rACTTTtcrr ccrccrcAcr gtgsaaca27 caaaaaazac 

14£0 1460 1470 U3Q 1430 1300 

AAAAAOSAAS CCACGTCCAT CTCTAArtXC ACSCTCAGAG GCTSAGGCAG CAGSArcCCT 

1510 * 1S20 1520 1540 1££0 1560 

TTCGCCCACG AGTTCACAAG CAGC7TCGGC AACSTAGCAA GACCCZCCCT CTATTAAACA 

1570 1330 1530 1500 1510 152Q 

AAACAAAAAA CAAATAT7CC AAGTATTTTA tATCCArSSA ArCTATAXCX CATSAAAAAA 

1S20 1540 1550 1550 1670 1530 

T7ACTGTAAA ATAC^TATAC TATSATTAC3C TATCAACAt- £A&?CATAA? TTATOrTACrr 



P 1590 1700 1710 1720 1720 1740 

□ irccoAttTc AAXcccrrr: taccccattc tctcaakaaa tAAAAGCACA AAACAAAAAA 

1750 1760 1770 1730 17S0 1300 

f;g Aoxro^AAcr gaaaaataaa caxttccata r aasaccaca atztaactm orrrtw^rr 

'■^J 1310 1320 ia20 1340 1350 1360 

%% G-rm??zs:t rrsrrTCAAoc aggcccttcc ccrsrcaccc aggntsgagt caagtgcagx 

S * r = 1370 1330 13S0 1900 1920 1920 

i 1 GCCACSASST CACT5 CAG 



I 1- 




i. 



* 092 

'>■ 



925-3 13- cea 



IC 20 30 40 50 SO 

cacscactka crxccrAAAr citAACs-r^r: tCAAAcACAfl Atctgc;ic3G MA«rAC73 

70 30 90 100 110 120 

ctxdc^Hcr crAArccrcA rcactsaitw <wAMAcrs«v AccnccMca atcac2tc«c 

120 1*0 130 150 170 ISO 

cstctgsaaz h^aoascac ccrsccauu! a?co«aaac ccxssctct a ctaaaaaxag 

190 200 310 220 230 240 

CGUTAAWHWA CCCSACCSM CTCCCCC3CA CACC7AC7CA WAGCCTTAA 

250 " 260 270 230 290 300 

SCAC3AC51AH T^CTTSAACC CaCGAGCCAO AGCHTCTSST GARCTSACA- C=TGC£ACT3 

310 320 330 340 3SO 360 

CACTCCaCTC TOCGCSACQ ACTCAGACCC TCTCTCCTINK AACAAAAAAA AAATCTSrAC 

370 330 390 400 410 420 

TTTTTAAGC3 TTSTCCCACC TCT^AATTA- ACT3AAATCC TTCtrrTCTA CCTOSCSXS 

430 440 450 460 470 430 

GccrcGcrrA t»mma?c T c z a g cs T S crscrcrrrr ttacattca* ttacttsgcg 

490 500 510 520 320 540 

TAAGTTKGA AATTTWCSr CTSTCrTTCA CAAT2AACTA CC22JNGTSC? GT3XACCXAC 

SSO 560 370 330 390 6CO 

caxttaaacc cAzcrAcrrr carsassAAr tactctsaag srrsuraT stccacaxat 

610 620 62G 64C 650 6cQ 

AccftcAiac: TMiaafflu. AAescsvsrc AOTArrAcrA axtsacaca? rcrrcrox^c 

S70 6ao 690 700 710 720 

ctccraccrr asaataagta gaactgaaac saacttaaga cracA<rrau ttctaagcct 

730 740 750 7SO 770 730 

T70GCGAAGS ATTAXATAGC CTTCSUTTaC CAAOTCTTCr GCTAXCAGAA TGrTTSTAftA 

790 800 310 620 830 840 
CAAACOT-niT CAASCAAT3G TATAAANACC AAAAATAAtr CA3? 






10 



20 



50 



60 



gzzztzcccx rcrrcrccAc acagtttotg ccrracArtA rrAcrccrrs ccasttttcaa 

70 SO 90 ICO 110 12Q 

caaaccatto TCAccrcrrc oaxctccat cAcrtrrccs crrcrrrrcr actttcccac 

130 140 ISO i£0 170 130 

Acaxraxcrr otacagcctt ttatcoacca ArrAGCAixc caxcaattt? asasctacca 

190 200 210 220 230 24Q 

XArrrccsoT xagaatccca tcgatctttc rrcxxTsacr A»acaajur ctsckacca 

2S0* 260 270 230 290 300 

CAAACCtSAT TTCCTGTGTC CACATCTAAC AAATCUGAT CCCCSOCTSC ACTTTTGCAG 

310 320 330 340 350 350 

crrccrrccA AjGTcttccts accaccttcc aczxttscac tttccaacca cctsctzata 

370 330 390 400 ' 410 420 

CAAAACOAS? TTCAACAXAC TTCATCCCAG ICCACTOrCT CCTCSCT5CA GAAACIACCA 

430 440 450 460 470 430 

c anru agco acsaccccaa cgaoatatoa tagocccsca A<rrrccrsxs ccccaccagc 

490 300 510 520 530 540 

Accrroaccc gtsstcacag <sAC0Arrrrc actoaocts coaactctca csacsaccst 

350 560 570 530 590 600 

TACTAAGASS TTASSTOAAS TCCrTTAAAC CAAACS3AAC tCTTCATCrT AAACTACVCS 

610 620 630 640 650 660 

TTGAAAAXCA ACCCAATAAT TCTGTATTAA CTOAATTCTS AACTTTTC^C OAGGTACTCT 

670 630 690 700 710 720 

CAGGAACACC AGGCACCACC AGCACAATCG GCAAKSAGA GCTCCCCACC CCTZCCSCC? 

730 740 7S0 760 770 780 

TcccrrrcAt rrrrrt; . 



094 



122 



10 2C 30 40 SQ SO 

gga-ccsccc GccrrsGCcr cccaaactcc TccsartAca cccxxcaccc accsctccw 

70 SO 90 xca 110 150 

€croActcrc csArrrcrrs ccacctctac czscrrcrtr? cArcraAcc aactcactca 

130 14C ISO 1$0 170 130 

a czxvc«w Arrc=rrcr ccraawraA aataagsats TtascrsHCs hhcctocctt 

190 2C0 210 220 230 240 

CSCCttTKG ATAAGSAr^A GATGACAtTA TA£AAS2mC CAAAAXTAAA AGCSCTAGAC 

2S0 2SG 270 230 230 300 

aaa«atttt atgaaaatas: AAAoArrAca Trsacrrzcc cccaccazas axaAftC<?AAr 

310 320 320 340 350 360 

GTTCAGAACA T7CCHKAAG CATTACrCAA OCTCCCCTTT TS3rOZS»AA TOCAWTCTC 

370 330 390 4CQ 410 420 

ACTCUflm^ CriTNTSTGW XTCAAAATS? trO>X ^^J CAGGCSGTTC CTACrrATTG 

43Q 440 450 4£0 470 430 

CTAAACACTC CTACCTTCAG CTTArAGTAA ft lSJJl ' C aO? XAGT70AAAO TC3ZGACAAA 

450 5C0 510 520 530 540 

raasacasr ccrsGrrrAC AAArrc^Tc: taraacxarr tsattscctt AAAxrafArrr 

S50 SSC S70 ScO 530 600 

acsvcwrrr AAraacaAt ggatgacsttg stsaaakct asttcagacc tsascxwa 

610 «C 620 640 550 6*0 

cccrccuCT gacaacagcc rrrscscTcc rru^cAccr skcctccag cacaacaca? 

670 €30 690 700 710 720 

caaagaaacc rrrcirrc TC crourorAA tctaiccaac Ktttrssas aacacsataa 

730 740 730 7€0 77Q 780 

woTacTcca cAAA c rxcrs rrrrrcrrtc ccttttcaga acctcaajgac ccrrrcrrr: 

790 800 310 820 *30 840 

CTG7CAAACA OTA lTilL^A g ACAGTTTtJCT AGACTTACCT GCAC27CGT2 

850 860 370 880 390 900 

cxccrTAcrr cca<saat=ca cagatgtcts AocAcaACca cctgagcaax act 



F5CC2AH blastn DATALI3 or Begin >10H.ccn 

TCAGAAAA?ACTT!^CSSCACA2^^ 

ATSCUUTSTAAXAGTAA?^^ 

AAACAAAACX^TATGAACSCTSAGAATA^T^^ 

AATCTC^KCTNAGGT^GGCSCAGC^ 

TCTCCCCTATAAGTGSATCC:!^ 

TAACCTCTCATCCACTOK^CI^ 

AXTCCA2TCCACTCOAAATACCTCC^GGTrACrt3AAAGC 

CCSTAGcHAAArrCTACAGT 



PHCG2AH tolaatsi DATA&Z3 nr Begin >I024.csn 



GTSCTCrAGATCTCTTCSW^TTCArE^ 
CCA6ACrTGCX«CCAATGGATCCTCCA^ 

GcrcGTAAAGAArccrGarrrrGAirrArrOT^ 



f 



f } 



9HCCTAH blasts DAXALI3 nr Be^in >1024t-ccn 

AAXCTAAC^C*AAACC?aAACXCCrCT^ 
NCACXCTCAGACATAAAXAXAAAOGlAOTTTCXACr^ 

XCrTSCACTOAGNAAACXGCrJACa^^ 

GXtfccAAXGACcarffTGANT^^ 

TTGNAAAGCXSTCTCCCIGSACAGKMCCCCT^ 
frAGrTC^GXXT2ftfl*AA«OT(^^ 

AX 



098 



^f^Ue/cs- 3i) tic ill 



PHCC2AK blastn DATAJ1X3 nr Begin >10S3.ccxi 

ATCTGTGCTAGGTAGTGTACTAATCATTa^^ 

ATTCGCTNTGAiTatACACATAAC^GATCTCC^ 

AAGNTCATAATifCTAAGHAGTICTAGNATCGAGArc 

TACT GTTAACTAGTACCTTTACA CT ACT AACTGGGTAANCCATAANCAATTAATGATAAAG ATTG 
AGATTACTXCCLACATTCTCACTCGTT^^ 

TGGTAATATTTTTATTAX3GATAAACTTTCAAGNACTGGATKCTAGGTG 
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FHCSSAX biasm DATAUI3 nr Begin >!C92-«n 



cc»cAcrG2rrs«ccATs;iAAcccAKA(^^ 

AT7GG7TKtfTC:^GAC7AGTCACC?GAr?^ 

Efa^TTACAAAACACTGACCCXAGYtS^^ 

GGA^CTTA^CC^GCCATG^C^CATC^^ 

AAACCCACCrsCSrTCCTGXTTCCCAASffiTI^^ 

HGSAICOTCAAACGTTGCTCaAAOTHGSMCACW 

TTTAITGAAGCrGAAAGACCCTTGACrAGAAC 
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blastn DATALI3 nr Begin MXiO.coa 



ATICTIATTKTCCrrCACTAC^ 

TTGGATC??aGTAGCCSTTTCnrCAGOCTCCCTCTCC«AA^ 
<^CAO^TGGTTAGGCAaX3CGACI^C^ 

AGATCTCcrra^TTATTACT^^ 

TC?CCTTTCArrC7GAGTTTCTCC^^ 
TACC6C^A2lTTCTACTOTCaAGTCTTT 
GACCA 



<?^£Vt£/ ^0 A4 ' a - 2c 



SHCGaAM blasts, daxau:3 nr Begin >UL27.con 



A2rrTTCTCKJ<rr!TAAAAGCA CCTNA^A CATAATATAG TGGA CTTKCAATAAACACTTACCAAA 
TGGAtfAAATGAACCC^GGTCACCCCSA^ 
GtCCrrrTCrCCTTTACTAACCCXTNCrCCAAXCCT 
CCTGCGCiaCCXCGGKCCTCTNCCCT^ 
CSC^CCACCTATGAGAGCCTTrra 
NCAC^CTACTGCrrGTCCTTCZTGGATTiTTr 
GGCAAGATCCTTGAACAAAAGGAGCTATAAAAGGwC 
AAACAAGCAGGCACCTCAAGGAAACGTGACGC 



blasra DATXLZ3 nr Begin >i247.csn 
CTCSACA2CTKCC»?CATT7^^ 

OTCMXCCCTITCSAArC^AtfC^^ 

ATCgCCACCACTTCAACCCCAACTTC^^ 
CCA(KGAATTTAOSAOT 

TTAGGAXACTTTTCrCCTTCAGCT CAC2CXGAAACTCCCTCXCSA 
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PHCG2AH blastn CATALI3 n=r Boqia >Il£3,cca 



TCGAGATCTGTGCTrAGTOACATGATATTCrCwCAMCrA^^ 
AOTTAAAGAAAAATG^CAGTAT^ 
GTTNCAG2fc?AAG£AAAAAGXAAAGA^^ 
TCTTGACCATTAACrrGGACrTCATTT^ 

NACAGAGTAAHTSAAA . 
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PHCG2LU* Slastn DATALX3 nr Begin >119.ccn 



GACCCAGTAAAACTT ATCT CAXCAGCATAAGGCTGAA7G0G ATTGACAGC CTACAG AACCCGCAT 

TT7A~a\TGACGGOrrAG7GGGCG7TGG^^ 

AGCAACTrGXCCCTTACAGGGTCAAGCTAGGTCAACG^ 

TACCTGATAGCrGAGCrCAAGCTTATGACCGCCCAAGCTTCrCCCCAAGCTXCCCr^ 
TCCTCTTGATTGACTTCCACAGCAAGGXC 



PRCCRAM tlasrn QATALI3 nr Begin >I28.con 



CCATCACSATXTACrGACTAAAAATCTCAt^^ 
CAGGAACACGTTACTTCCCAGGAAAAAAGCTGCCTCGGZ^^ 

TAAAAAX CCTNGAXCGAAAT ATTTTNTAAAAG AACTT GGGGTriTAATAXGGKATACTGCCCATCA 
AACAAAAAACCAAA TAAAA CTTCTTrcCCA TTTAT^ 
TA<^C7TAT?GACCTTTrATGC?NGC^^ 
TTCTA}rGTCTG2rArTTCAATGTCCGTA(^ACTJrATTTTTCAA 



PROGRAM blasts 0A7AU:3 nr Begin >I50.ccn 



GAGTACCTGACAGGTAAG ATTG CTTTTTAAAGTTGTTTTAAAT CCATTACATGACTG AG AAAAG A 

AAAATGCACATT^ATTG TTGC AGTTTAAAATTTCAT^ 

GGGATAAArGTGTTTrGNTTTrGTTTTGGTT^ 

TAGAAACC CGTGTGGNTA CACXGGGTAAT CTTGTCAGGGOTAOtAAWCTTGGGTCTTG AKTTTGG 
TTA^rrTGGjrTTTA^rrTGGTGNACCCATGTACrrGCTCTTCC^ 
CNACG<7rrAAtfCCAG7GTCCGCGA2TCCnT^ 
ATCGTTCCNTCCCAGGATGGANTTATCATTATAAA 
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S&Jae?/^ x;; tic : 2- C 



GAGAGGCC CAGG AGCCACAAAX AAAG CAAG AGC CAG AAX CACAAGNGGAGG AA CAACAAAAG CAA 

GAAAAAG-JVG^UA^ACGAGAAGAACCCAXCG^GAGGAAGACGAtf^ 

GAAACCTACrCTGAGGCCCAXCAGCXCTGCTCCATCXGTTTCCTCXGCCAGTGGN^ 

CTAACACXCCXGGGGATGAGXCrCCCTGXGGXATXATTATXCCXCAXGIU^ 

CACC^CCTCACXJACCArAGSCtrttAAATAGSA 

TGCXCAGCCXAATXCXGTGAAGAGAAAGAAACXACCXGXAGATAGXGXCTrT^ 

ATCAAGACAGTGAXGACGTACCCCGAAAAAGGAAACTGGT'XCCCXTGGATTAXGGTGAAGAXGAX 

AAAAAXN CAAC CAAAGGGA CTGXAAACACXGAAGAAAAGCGXAAACACAXTAAGAGXCTCAXTGA 

GAAAATCCCTACAGCCAAACCTGAGCTCTTCGCrTATCCCCTGGATTGGTCTAXTGTC 

TACIGATGGAACGTCGAATTAGACCATGGATTAATAAGAAAATCATAGAATATATAGGTGAAGAA 

GAAGCrACATTAGTTGArTTJrGTTTGTTCTAAGGTTATGGCTCATAGT^TCACC 

AGAXGAXGXTGCC^XGGXACriGAXGAAGAAGCAGAAGITrrXATAGXC^ 

TGATATATGAAACAGAAGCCAAGAAAASTCGTCT^ 

CATTTCAGATTrCTTCrTTGCCACCCTTT^^ 

TGXGAGATCTGTAAXTTTXTrXXTrXGXAGAAAAXGXGAATTXTrTGGXCCr 

GC CCXGXGXACT C CCTTGGT7 GXAAAGTCATCXGAATCCTTGGTTCX CTTXAXACXCACCAGGXA 

CAAATTACTGGX AT GTTTXAXAAGCCG CAGCT ACTGTACACAGCCTAXCXGATAXAAX CTTGTTC 

TGCTGATTXGXTTCTTGXAAAXAXTAAAACGACTCCCCAAriATCT 

XXGAAAXGXACTGXAXAGGAACCAACAXGAACAATXT7AATTGAAAACACCAGXCATAAA 

ACCACCCC^CTCXCTTTTCAXCXGAAATGGCAAGCCCTTGXGAAGGCAXGGAGT 

AAXGCAAAAAXXAGCAGACAAXCCAXXCCXACTGTATTXCXGXAXGAAXGXGXTXCT 

CXGXAAAACTCTXXCTXrXCCCXAAXXXGCXXTGGXGGGGXCCX^ 

AATAGAAXXGTAAAGGAAAAGXGGXACTGXXCCAACCXGAAAXGXCXGXXAXAATTAGGXXAXX^ 

C7TXCCCAG AGCATGGXGXXCX CGXGXCGXG AGCAAXGXGGXTTGCXAACXGG AXGGGGTXTT CT 

TATTAATAAGAXGGCXGCTTC\GCrrCXCrrXTAAAGGAATQXGGA 

TAATTTTATTGCTCAGAAAXGAGGCAXAXCCCXAAAAATCCXGGAGAGw 

TTGCACTAAXTGGXCCTTAGXTrAAXTCXAXT^ 

CAAAAAGTGTAAGTGAjUUCCCCCrTTAAA^ 

GACAGACAGTGAGAGXTXTACVAAa\XGAXAGGTATTCTGCXCGGCAAXTTG^ 

XAXXXAAGGAXAAAGGXAAATCATXCAAGGCAGXTACCAACCACTAACXAX^ 

GXCXTGTAGAAGGTXTAXATCTXGTTTTACCITCGCTCATXAGTGTCT 

GXGCrrAGAGAAATXCCXGGGGCXTXCTTCGTrGTAG 

CCXGAAAACGXAAGAAGXTITAAACAGCXTTTCACACAAAXTA 

AGGXACXXATTXAAAAGAAAGGXAAAGATXGGCCXGTTAGAAAAAGCAXAAXGXGAGCT^ 
TACXGGAXXXrrTXTTXTTTXAAACAC\CCXGGAGAGGAttXTtGAA^ 
GAACCCTGATGXGGXTCCAXTAXGTAAAXAXTTCAAAXAarrAAAAAXG^^ 
AAAAAAAAAAAAAATTCCXGCGGCCGCAAGGGAATTC 
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PRCG2AK fclaatn 0AXALI3 nr Begin >170,ccn 

XCGAGATCXCCCCCAAGXAAAXGAAXGAAAAAAACAACACCAACAAXACAG 

CAGGCATGCATGACCTTArACCACCCTGTATTTATACAGAACCACCAGGAGGATAGTCA'XGACAA 
CTAXt^C^CTGAXCAXGAXNCCAGCAXXCAGAATXGAGTNCAGGG^ 

GTATCrrCTGTGNATGGGGXATAGATXAllCTGTCCATCCrTCCGCGNATAAAAKCrGACrGAC^ 
AAXGGTAHCCACGACCACCACCCATKCAGAGAGTC\CAGGCAC31AAAGAGCATGATCAACAXGCX 
XGGafCCATAXTTCAAXNTC^CTCCTCAXCTTCre^ 
TTAACCcfGGGGTCGXCCATXAGAXAAXGGCXCA 



1 



•i 

•1 



o 



i-TGTGGCTGACACAGCAGCA" 



:ttgacaag: 



:C\TG~ ACGACAAA 



C3 

■ m 
i rf 
, q 

:; Ik, 
-> O. 

ru: 

p: 



r * r"i~C — ~:3""ACATCTTAGC AT CATTCAAATAACCAAGATTAC AACCTCAGSAAAGA * «» « 

^^^^^Q^CTCTTCCAGC3CCCGTCACAGAC3TACrCCCrCTGAGGCCSACCGATt3wrTAGAA 

i^i^^^cccacrsccArcrcccAeccA^ 
ccaga^tc^ccaacgta^sccactscag^^ 

CCAGCC^GG^CAGGCAGCAGACATTCCCICACTACGAGGCAAGCACrGCTACCACCAG a CCCTTw 

^£cAC«GACA«CArACA<SA^ 
t^A^A^GA^ 

TACCAGAC\GGGAGCAGGGGG~AGC5G7CAAAGGAGC^AAACAGA!frrToxCTC 
GA^CAAGAAGAGTGGTTTGTGCTCAGGCTGGGAACAGAGA^^ 

AGGAGCCCA«ATCAGCAACTGCCCAGCAGAG< — ATA.. - --gg™ " ^^ T^- — ^zzx-j 1 
GAGTAACAGAATAAATATTATATATA-CAAAAGCCA^lAATC^ 

, cr*CAGA*\^\AGAAGTrGrATGAGrrGTAAG7AATCrrG^-AAAGN»*A^CKs« 

AAGgISg^GCACGCAGACAGGATCCC^ 

gaSg^attgtcctgg^ 

CA^GAAATATGCTCTATAGAGAATATATCTTTTA^ 

CATCCCCTTGACAGTTGCAGCCTCTTGACCTCSCATAACAATAA^ 
TCTGCrTrAAC\AAAATAAATGTTCATGGTAG 



MA 



GGCAGCTAXTTACATCGCCTCACAGGCATCAGCTGAAAAGAGGACCC^^ 

TTGCtGGTCTrGCrGArGTrACAATCACHCAGTTCTATAGACTGATCTATCCTCGAGCCCCAGAT 
CTGTTCCTTACAGACTTCHAATTXGACACCCCAGTGCACAAACr^ 
AGYTAACGTCXAArrCTTGA^JNACXAAAO?^ 
TTCAGCCTTTCATAAGCNAAAAC^AAGACATGGN^^ 

TrGGCArxcrrcTATciTA 



P^CC^iM blastn DATALI3 nr Bagin >227.ccn 
AAAGGGCTAACCAGCCACTGCACCAAAATTAGTCOT 

GAAAAATGGGAAAATTCAACAATTTCAAAGACTATGATC C CTCTGGCTCATGATCrACTGACCAG 
AATCAAGTCCXGAAGGATTTCCrTCTGTTAXGTTATCT^ 

TGGAAAGAAC\AAGCCCttrGAAGCTACCCCTAGACCCAGAAAGCCAAGAACAGGGCC^ 

TGAACACCACA<»AGCCTGAAATA<SAAGTtK^ 

HCG7GGTAAAGATTCCGATATCAAGCTTATCGATACCG 



PSLCGSAit blasta DATA1I3 nr Begin >2 6 5- 3 25. con 



actacatsstttc 

TAGTAGACCTTT 



rrcCCCATCTCTACCSCTTSATAGTCrCTCAGJf 
'CrNGrTTA<3KCAMGCCACHTTrTrAAAAACrcCAGACS«TACCCrc 



rrsjrccACcccAsccACc: 



TXGMAGCCSACtrTSSCCCTMATCA^ 
ACAATGGTGtfACCCCTCCTAAGGCCt^CCrGAGTGTCC 
rrCCCACGCCTNCACCCTTCrrXCAAAaCCCAXAffi^ 

rTGGGGGAXGTGTGT 



PKCGSAH blasts 0ATA1I3 nr Begin >293.c=a 
c - c ^~,^;^XTACAGTTACSATATATGAAACSTACAAAATATrATG 

A^AA^^AGTTTGGCGGArCAGGGCACArrTCTCrAAGAAAGTGACAITTGAATTCACCTCrG 
llP-a-i^-AGACATTACCCAGAAGAArAAAAXGATGGGCAAGAAGGAGGACATTTTCCGTAGA 
^CCAG^GCCCCKCTXGATCCCTrATCCACrCATCACrHAGGAGCATArTAAATXCTATAGAA 
i^^^GSAAGACCiAAACAGACCCTlIATATCTCSAGAGGArCCAGOJAAATTCCAAGAGACAC\ 
ACAWTAAGAAACTNGCAAGGAAGAGAAAACSC^^ 

HAGNACGGAGACAAAGAGAGAGGGAGCGT^AAGGG^ACGAGAAACGCGAGIIACGS^GACGAGAA 
AGGGNAAGAGinVCGTAAACG 



114 



2 x ^ <"=m 



GGAAATvUArSACATCTCACTCGTCGTATGGATTKAC^ 

GCACCGCAAAA*GAAAGCCACATCCCAGTAAGGGGTAGAGAGCC^wCAACAGL 

GGCTGCCGCAGAAA7CAAAGTCTAGGAAGTAAGAGGTAAGAGTGTACTACAGGGGACATACCICCA 

ATCTC^TGGT^CCCTCCCTCTNCCXTCCTCTCCCAGAGACCCAGGTCCC^GGGACTAT^^^^GGAT 

CTGTCTCTGAAGCTGAAAAACA^GGCAGAGGAGACAGTCGG^ 

CAGCTTGGTCAGAANTCCTAA 



PECG2AM blasts DATALI2 nr Begin >295-43.c=n 
es~r^craGGAACCCCTCAC^GACT 

TCCCCTGCrCCr:rrCCCTltCCCCAGCSCSAGATAG<UAACCGGAASCCrGGSCAGGCrGASCCCA 
MC'GACTGGAACCAGGG:iAGANCCTG7G<KTGGGTGGITAGGGAGGGAAGGAGGCC^GATTCCTC^ 
AGA^CTGGGGxUGAGAACAGGTrTTGGAAGTTGGCGGAGGCTTTGGGTTTCA^ 

ATclLScCCrCGACGGTrJfCACAC^CCra^ 
CTTCCTTTCAACCCTCCNCNTAAAAAGTITTGATirrTTTAAGG 



ac^AAGAGCCCCCAGTTTATCJTTAACTCrCATGACAAAC^ 

A~C^ \GGGT AAATCCCTGAC CATGTGAG AGGAATCCTAGTGCCCCAACAAC CTCACC CCCTGACT 

-^X^Cj^GSCTCTGCC^GTC^ 

AGAC^GCG^CAACCTAAATSTCCAXC^ 

CTAGGAAGGAATGAAGAA2GTCTAT 



5(?/&evc e • a/- : 3(c 



PROGRAM blasts DATAL23 nr Begin >31<3-ccn 



AAAGCGAACAAAAGCrGGTACCCGCCCCCCCCTCGAGCTCGACGGTATCGATAAGCTGGATATCS 
AATCCT CGAGATCTACCTAAAAAAAAAAAATTAACTTCCCAAAXGTGGGAGTGTACXCr GTTC CC 

NACrKXGGKGTGAACATTTTT^CTArTATAAATYCr^AGAA 

TXCATTCCAAGTGCCTXGTAATTTACrrCT 

TTYCACTATTTAAAAAAACAGNAATATCTCT 

TTJf C CTTAGGOTAAAAT CCXAGAAGTAGAATTTTTGGGGCAAATTATCTACATATTT ATAATTGT 
CTTCG7ATTCCAAATCTCGTTTTC:^AAA^ ' 



PROGRAM blasts DAXALI3 nr Begin >322.ccn 



ATTTAAGATCACrCCC^TCr CTNC^ 
TCCTCTACTTAJlGGi^NTGGGGACCCTCCCAAGAGCTCN^ 

T:fCTAAATCcAC\rGGNccriccAA«;r:crcATccrcTAc^ 

ACTTGTCXYCTGANATGCrrrCTNGAGGSGtfACAAAACA^ 

AACTGCAGAGAAXGHAAAATAAGTCCATAGGAGAATCTTGNAAATAGAATCATCCKCCTTTACA^ 
ACTCTCACTCCAGGAAAACTGCCAAGAACCACTCACT 

CCCAGACTTCCTCCCTTAAGCACGTCAGTATTCTCCTTATTCTCCCrTCATTTCXXCCCT 



119 



o 



$gjfcus*/<:e J£> ^s c 3? 



PaCG2Atf blaste DATAJLI3 nr Begin >323-ll:27.ccn 



CaAgATCrCCCCCACCCCACArTTC C T rru i 7 GAATGAGTAGAGAAGACTGAGAAGTATCACTCA 

CCCGTGATGTSGTTTGTCCrrrTTTCCAGCC^^ 

TGGTCCCCSTAATCCCCGTCTTTCOOTGTCC^^ 

AAGATTTGTAGAACCTTAAACTGGAAAGGGACTTGG^GCT 

AGCATCAAGGAATTAGAAGTCCTGAGAGATGAAG^TC 

TCAAGCCArrCCCCTCGTTACTG^ATrGGCCACAACCCTTCCCCCTTGIITATCCXCAr 
ATCCTGTTTTTAATGGCCrrGCCAGTCTGGATTrGTCTCT^ 
GATTCCTACTTAAGGGAAGAGAGGGGCrCCTCArrTNTCACTTGTA^ 
CrTTACACAGGTGTCAGATGAACCGTCACA^^ 

AGGTCCAACAT CTCGACGTAAGG AG CGTTCCCAG' n ' CATCCTCAGATAACACTN CTAACTN 

CAGCrGTTTCATCCCNAATCCCTANTTGAGGTCTTAACATCTA 

AXirCTGTTAACCCTCTNCACCAGANTTAGAKCTGACTGATITCAcrTCCTAG 



F5CC3AX blaa^n DACALI3 nr Begin >424-216.ccn 

TCATACTTGTATAGTTCaNTAAGATAATCACT 

AAAAACAGCCXYANCICAGTTCTOTCICCCTAA^ 
G*f*ITC7GTGCCATNAilANAr}ITnrCAJrTTG7Aj^^ 
WTtfTHCT:fCTC^2ttAC2ICCCSCXCCAH77^^ 
TG 



o 



SGfLiGNM* 3*) "f^ 

PRCGSA2* blaster D&TXLZ* or Qe^in >424-237.«n 



CCAGACTrrCATAACTtfCTGTTATrAT^^ 

ACSAGGGGGTAGCTGCCCCAATATArrCTAATTTCTCTKGAGGA 

TCTCATAGGGAAAACGAAGAGTTGG^CGNATCrTAGCCrCrAGG 

GGCC\CCAAAGTTACATCTAG7XGCCTACAAATTTACTTCCAAA 

GGTCCTG*fAAACTGATGCCAAACTA7AC7TTAGTCTNCTA^ 

AATTATCTGGGMAAACAGACCTGATCCXAACACAGTTXGCTKCT^ 

AGC CTGTY CCGTCTACTlIGGGGTGTCTnCG ATTTGCTCCAG 



PRCG2A2* fclastn 0ATALI3 nr Begin >4S0.can 



TT Tl ' TTT CCACCAGACrTACCAAATrTTAGAXG^ATCCAAGAACTGTAAATN' CCCAXAAAGMTAA 
TCTATN CATNGACCCCCAC CAXTAXGAXAGAGAXCAXNTGGXGAMTAAXGAAAGAXG AAACTCXC 
AGCTGGGAAAGTAANAAGGAAXAGGAXGTAAGXAXGAGCTCCXG * 1 ' ITI ' i 'ATTATNTTTATGGAT 
GCCCCCTCAGAAAAATATGNi^CCGGXAA^ 
CCCACrCACGAGGTTT 



PROG2A21 blasts* CATALI3 nr Heci.i >*6Z.con 



TC CAGATCTAAAGCAG ATG?TACACTIT!TCAOf AAATAAArTTACTG CTTTTTTY C7G7GA2T ATAA 

GTTMCGAG AAGGAAAGCTTTKG ATTNCTHNA- GAGTX CAGTGGAXTAT Y C7NAGJI ACTAGAGTXG 

HKGTKGAAGNCArGGNACAXTTATATAG^TYTrrTCAGTTCTACACTAAATGATG 

AATCCTA?AT3ACAAA~AGAAAAGTY»TYCTY«T^ 

AAGAXCTCGA 



PSCGPJW blastn DATA LI 3 nr Begin >435.ccn 



CrTACrrTTT ACACT AGTTTCArTATACTACO 

GATATATTTGTTTTA ACAT ATS X IX 1' i'„ 1 + ITAGCAGGTAAAAGAATCATAACAAATCTTITTAA 

AAGAACATTAiTAXrCTTTAATAACTGTCXTTTTATGCAT^ 

CATCTTGCSTATTTTAIAAAAAGAGCWAAAGCTCAA^ 

AAArTAAAX ArTTAA CAAATCTCCTTCCCTrCHCCCTTCCCCATCCCT 

TATCTTTAACTTTTGGGeraCCATC 

GTCACCNAACTTGGC^GCACAAATAATCXAGTCTTACr^ 
TTTTCC^GTTNCAGT TCCAA ATGTTTTGTGGN^ 

AGAGGGGGAAACCAACrrTCCAGTGTTGGAGAGCACTGNATAGTTTATG^ATTGTGTAAA 



2S53 Securer zz xc. 44 




ACTrCTA^CCCXGTTACrcCjuAAKCr? GAGCCAGSACGAATCGC 



3SoO SSQ^STCS 13 SC. 43 



Tscrcr=c^c\ca?£AG^^ 



:agagaacc:: 



CACU3CACA 



TAATGAAC 



tJ^ACAAACAAAA 



:cac-cga 



3552 12 SC. 45 



O 

in 
|y 

in 

C3 

ru 

O 

a 



ACAG:;AA£C£^^crrcTArc^^ 



cuc£^T5T2CA:rrrsc^^ 



3570 SaQTOICS X3 SO. 47 



r3TGAAAT~ 
rACAATAAAACXAACACAOTGCCC:^^ 



TACACrcrAAAATTC-CCTCAA^^ 

GA:ie ^A^a crcGcrccATCTA<srGGGrG s,. , ^ .t: ^ua aataattataactcojcat 
CAAArrrafssGGG^accrarr^ 

AiTCCACTOG^rTGACAAC^^ 
TCfCATCTGCAGAGCAATTG 



3571 SZQUZtfa 12 WO. 43 

AAA A^'J^ , WTOCAAA7 CT^TCn^GTTCCAXCTCrA^ 

ACTCTirtANTTQAGTtTTAGAATotGAACrT^ 

TCrSAAAQfOAACTTTTAK^ 

GSTT?£ AACSaCAG^TT^ 

GTTTTA^GAACAAAG^TTSGC^^ 

CACAAACCTTtTrAATAAOAAfftTArrCACAA^ 

scrccsacqrr7AiacA r ;g7crgrGg cx " jcjiT 



^GTcrcrcocrscriTCAC 

5ACCA7ATGCCCCATACAGCTTTAA 



A7CAAGCAACAG7GTG77A77C™A7AC7C^7DT77A7A7G7GTG7A77AAA 

7TC7G7A7A7A7:rr37A7S7A7AACT:rr crGrcT^TGTArcA-rsArTcr^cTccc^r: — 

GAAC^7GAAACAAAGCACAC:rr7:A777AAC£^^ 

c<^AAAraArrrArcr cczActTr3nAA ; „ A ^ 

- * - -Au* - .,:TAGG** *C77GC7G7,i . *<-»C£CAGGC7GGACrGCAGTAG7G7GA7 

c\7AG:r7CACACAG;rc7C7AAcr7c:3tGC^ 

c-7Ac:rr3GC-Ac: * 



3573 SSCCZNCE ID SC, 50 

CAAAAAATCAAAGGGAACT7CGAAC7CCTGCCCACC7CrC^77CC 7CAC7C7GC7 GG75 

GrGtTCrgCT CTIC CrCACAgrAC— CC^AAAAGT:CAGAA7T:^G77Aa?ACAGAA:rA 

TTGGCrrr^rTTrCAACGTGTAGrrrA^GAT^ 

ACC7AAC:r:C77GG7AAC:^7AG7Cer5ACAC7TCSCXC^ 

GACCACGCG7CAAGC7GCXArGGGGGAC\GAAAC77CCGGG^C7A7» 

:rCTCGGCCC7CAAA7C7GG7AG777C7GCACCGAGC^CACAG7CCACrGC^ 

7S77^AAAA7CSa . . J UiT*ACGGAAC7CCT7CCAAAGTCCAA7AG7GNAAGG7GG7CAA 

GGAACGA777GGAACGAAC^G:aAAAG7:^Gr;CGGGAA7 

A2CUrAGGAAA7CACr^GCC 



357? SZZZZllCS ID SO. SI 

GGAAAG£GG77r777AACAC7CAGACAG7G7AAAAATCCA G , . . . SSTTTTSgaSS 
GAGACAGAG7C7CGCAC7GT7AGC7CA£^X7GGAGTGC;G7GGCAC 



3£aS 52QUc2TC2 ID HC. S2 

AG7CCCAGC7AC7CAGGAGGC7GGGGC\C<-AAC^r^ 
G7GTOAGCi;7SA7:^&C7AC:^^ 

TAAAA AASAA AASAAAAAAAAAA7A7A7S7A C; *' A' !* L^'l'GGAATTTCAA^GTGGCSAGArA 
AA7^T7T77CCAGAC3iG7A7C7?GAAACCCAAAGT77A7GCr7AAATA 
TGT7TCACC77aAAGCGGGAGAAGAA7C^7CA7ACACACACACACAC77!ATAC^ 
A7AXATACAAAAZACA± » a-'^"AA7ACACACA7A7AAACA7GGAG7A7AGGCA7AAC\CA 
CTGT7GC773A7AAAATA7AGGGATCC 



3535 ID 270, 53 

TATATT7tTA7CAAGCAACAG7G7GT7ATGC CTATAC7CCATG77TATATGTGTGT A77AA 
AAAATG7A? T7TGTATAXA7G7G7A7G7A7AAG7G7GTG7G7G7GrA7GAXGA7?C7CCT 
CCCG72J7TGAAGG7GAAAGAAAGCACACC77TA77^ 

TACr J : UTUGAAAAATGATT7A7CrCCCAC7TTGAAATTCCAAAATACGXACATA7A77T 
T7TTr77C77T7C7777T7AGT77?TAGGGTCrTGCrGTG77GCCCAGGC7SGAGTGCAG7 
^^57GA7CA7AG2Tr CXCACAGGCTCXA^CTCCCAGSlTCAAGC7AirCTTCC7GCCCCAG 
NCTCC7GACTAGG7GCGACT 



C7GCAGTAAGCCACC777A -2CCAC7"AC7 77AG CC7SGA~ACAGAGAGAGA7CC7GT 
CTTT5GAACAAAAAAA:^AAA£AAAAAAAAAAC^^ 

GCCACAGAG7ACG7ACG7GAAAGC7G CC7GGG7 77AA7GGC7GG? AG7A7G77C7AAC77 
G7TCACC7 ACC:i\rG77AC7AC^^3G77ACkGAA7G7SAA7C7 CACAC757CC? AAA? 
CGG77T7A7T777AAAA 7 GAA 7AA77C7A ? TACA77AG GTTArAAAAAC 7AGC7AAC 7TA 
A7777C<nrr777J^AAAG7GAA77GAGGGCAGA7GCA^ 7 7CACACG7A77AA7C 7CA 
AA7ACC77GGAGAGGGCAAGG7A(X^GGA77GC7TGGAGCCCAGGAG7C 
C?AGGGAA7A??G:?AACAA2fG?CCrC?CrXCAA7AAA7AA? 



3335 SSCmCS 12 NO, 53 

GGGGTGCAG?GGZlAC^CTCATAGC?GACrGC^CCrrGAACTCCC?GGIT?CA?GCGA?CC 
TCCCACTTt^CTCTWAltTrAC^ 

77G777A77:iGSGC^GAGAGAACG7 7CT7GC7A7Ar7GCC7AGGC7GG7G7 77GAAC7C 
77GGGN77 C3UCCaA?CG?CC?ACr?KGCCrC7? CAACG7*? r:GGGArr?A?A C-G?G? 

7Ar-X^^7G7i^ 

G£> 7 7GACAr7C7G7X^CCACCAG7GG7GAAA7GGG7CCCCGAACAAGGTAGAACA7A 
GCAGCGA77AACGCCAGGGAC7G77CUG7CCG77 C 



3533 ScmGX:IC3 ID SC. £5 



:07GC7rCUA7AGC7AC77AAG7GA7GAG7AC^C7GGGC3C^ 

X<?3AC<3C7GAGGCAGGCAGATCAC77GAGGTCAGGA 



CCAGCAC 



:aaaaatxcuaaat: 



7GGCCAACVIGGi{nAAACG7CG77777J 

<rTGCGCAC7^AG7CCCAC<7AC7CGGAAGGG?7GAGGC^ 
CGGGAGCC^GAGG77G77:iG7G:CU:CTaACArGACGCaC7? GCACTCCACGCTSGGNAA 

CAAAAGC<»GACC77nC7«AAA^ 

ccc7JuiACArcrrrr:c7ccAAGAcc 



3530 



ID NO. 57 



GATA<»ACTAaGC3CA75CAC:^^ 
C^7SAGATCTCACwTt3777£C^ 

ITvII ' I ' l U - U ' l J J UgQACACSSAGTCra^ 
SC3T:JNA7C775GC:CAC77G;UAGC7TCGCC?CCCAGGCT^ 

CAGCCT CCCAAST^IGCrrSG^CTAC^GOTATCTGCACCACG^CCGGTTAj. i' 'uJ I - GGG7T 
TG2J2TO^GGGACGGCG7T7CACGA7GT7AGGCAGGA7GACrrCGGAC7T COJGACCCAAG 
A7CACCCTGC7CSGC7CCCA 



i 2^ 



3 = 31 32-;rs?cs ZD SC. S3 



caattc;^gacgac<::7::;cgc^ 
T?AC^AAAGT:GArG:^^Tcc:rccA^ 

gataccttgagc ctgccagttagagg c w 7 gt^agcratgatcacactactgcactccag 
cccgggcaacacaccv^gacc^aaaactaaaa^^ 

AAAGrr:A~crTAAAtAAACGT— G c r: 1 . i * cac^cxusgccgsagaascatca 
TCA^:cAC\c\CA«CAraxrc*r:TCACAr4 * j^cAAATNCAATT^rcniAATACAACA 
CA:rrr?AACATGCCGrr7:G 



35 92 S2Cu3HC3 ID SC. 53 

TT37A:r;^rc<iArcA7r:AATn^ 

CCAGCACTCrCGC5«SCT3aC^ 

TG5CCAACA7GG?IAAAACCTCGTCTCTACTAAAAACACV^^ 
*T3C3C\CrTCTAG7CCCAGC3U^CS^^ 

C^AA/G^GATC^rTCTCAAAAAA^^ 

ccgcagacatc ,r:rr zz cacK&crraasvrrc 



3534 13 SC. SO 

GAArrCCTGdrCTCAAGTOATC— C7CACCTCAG CCTCCCAAATTCCTGGGAT7ACAG7G 
-rgaSXaCTSTSCCTAGCC^ 

GAj\AATr?ArsTc:rACC7Ar^^ 
ArGTrrrAAACTAATArrrcccxiAGTArA 

C^CTTIIQAGGGAAAAAAGTGAArTAITGG 



3535 SSQU3:rC3 ID 2*0 « SI 



'GGCTGTGATT 

rrGGAcrGrrrr 



'CAG7ATAT5CTA 

rrccr 



TCAAGCACCTCCCTGAATGGACTGCGTGGCrCATCT! 

AAACCCAAGACTGACAATTTGTTTGTCACAGGAATGC - - - _ t . 

CATCTGTrTATCrrr^rmGAGAAAACGGTAACGT^ 

ArcArrAArrAGCTATAGCAAcrrTrrcxrr7GAAc3ArrrcGGcrGGGCA 
TGCcrraATcrrAGCAcrrrc^^ 

AAGACCAGCCTGGGCAACVTGGCAAAACCTCGTATCTACAGAAAATACAAAAA^ -7G7CG 
G07ATG 



1 ?a 



o 



C CCTTCCAGTGAATGGAAArCArTCCCAC ZACACCAAAArTCCAGAXCAGGAG wNAACA 
GTAATGTAGTCCACAC^AACrrA^^ 

CCAAACATTAGA-ArTAATnTCCCACCTTTAXAATT^^ 

?Grrrsa^ccvGT3Tr:cArrr::*3^ 

GGCTCAACTGXAGTCTTniC « 7 GGAGAT CAGGT^IGGTCTTCCCCAGC 
TAACTrGGTTSGCAIIAAC 



3510 SSCHZITCS 13 SC. S3 

TAAACAAC^CCCrCAZ^GGGCACTAArC^^ 

CAC^AArOTUSKACAGOACS^^ 
CAGATTTSUAAGArSTAGUtAATT:^^ 

TCCCAXAC%rrAGArArtAAr:r:rccA eL : : jA TAATrrTA CCATAACCZATATC A ACTg 
TGcrArrArrrArrTAA^crxrcTctaAAr^ 



3512 ScQUSCS 13 NO. $4 

CTtArc^AG^czrcccra^^ 

TGGTAAAACCCAAGACT^Ar^rTTGT^ 

rGArrrAGAGAAAATGGTAACrrGTACATCCCArAACTCTTC 

rcArrrt^c^rrrcsGcrGGGCArGGTA 
;aggcssgc^tcacttaagcccag 
Affrra^ccwcrrssGaACATSKAAAAccrcK 

AOCC7CG7AT 



3532 SSGCSSC3 ID NO. €3 

GTC\?GGTCTTGGC^y^G7GTCT7^ 
GAGC\CCGAGGATAACCAGACT7CAC7^ 

oficrrcrrTATTGCAXca^TTa 

ACTrCCZAICraTCCCSHAACrAA^^ 

GTcrrGGccrzATrr^cccAGccccTArrcAAA^ 

CCTGACACAAGGATTT 



130 



J 



j^GGCAAGTAAAC^CtTTGCVCACAAAA^ - * AAG AAACT?iG 

CArrrrTAA:rcTGCKXGAc^^ 

TCCArACATTACTAArAGATTArACAGAr^^ 

TTGTycAACArgCAAssTTAc^c ^ ^ 

tttgtg ^Trrr:AcrACK:c:rccxcTr=Ar cxxArTT*AGC \ A — : : r ccnrTTAAC 

GGAAcrrATcrrAccrTAArrAArAAGAcrr^AAAX -552-5 

ACAccAAC-^rcrrrG;^^ 

T^GC^<UGGr^Gcr:AC7cr:^^ 

AAAACCAAGACS<rrGAGGAGAGTGACC7:7^ 

TATTATACA^AGCAISCTA^^ - -ACAAw- . 

HCCArOT2aAC3T^CCC3GASG»TAJtCrrG 



3S42 




GATTACAGCCATGT 

^.CGArGTrGGGCAGGCrGG' 
CrCCCTiAAC-TGCTC^GATrTC^GGC 



GGTGGATrATTCACGAGTTTrCtGOGAA^flAGCSrrGGSCAr^ , , _ 

TGTCATGGksrrGGGGGC^^G7C7rr 

ACACACAAArrcrrrrc^^ 

ATTCTTTCCTTITCTAACrrrGGTGGATTA^T^ 
ATATATAAAGTNTTGGTGCCGC:^^^ 

Ar^^AATTT^GAC^ 
GTAGKAGGTAACTTGAATGA 



o 



SSQC3C3 X3 50. S3 

CT^CAGAGt^TTGCAACTuGALri , J * AAGATAATGTCACATATCCACw- .CCC^ 

TTTACTTCTGACrcrtrrTArATTTAGGAT^^ 

TAGGGAGAT3TGTTGT jTATGGArtCTTCTAAGGAGAGAATTCTGCTGACA* w*C--A.,j. 
CTGAAGTAAGCtAGAGGTTAGAXGCTAAAGAACAAAGAAGGAC^ 

ACGTGAAGCCACGCTACTAATGTGGACTGnCTAC CTCTG? AC7ACTCTATGAGAGAGAAA 
GTA7 "GCATTATTT 



3*sa sscrarcs 13 sc. 7a 



OVTGCTwrYrtjrCJCTGTSC-C^^'vj^ - AAGT3AG - 

<^^CrrrrA^CCCGGAAG^GGG^^ 

AAGCCAG7G35Cr-'i - .Vw. 4 .ACAGCATSTCArCAICAC wiAAGGCCTw^CA. -GAA 

GGSSCATSACTtaCCTC^ 

AGGGGTTArTACTtCArGTTrTAAGTGGAGAAAAGClAAC^ 

TATG^ATTACTGCIArAGGGCrnAAGTTArGCrGAA^ 1 GAA CrtCA3AAAy - u * » - -T ^fi ^ 
GGG2IGGAT7AAATTCCTG7 



3SS3 SZC^TCS 13 SC, 71 

TGCArCTCrAGGACrCTCArGGGCnCCAAAJSA^ 

AcrTTSTCAACTGTCrGAAAAA^^^ 

GAGAGCATCrAACAC\GNAC^AAGGGGAGC^^ 

c^cAC^GGCCAGirrcrtrrcTrc^ 

CAAA7GTGTGAATAAACT 4 



2 57 Q SSC-**2XC3 13 SO. 72 

crrrscaAAcTTcvrxsArrc 

^TT^G^TSTTACAGTC^^ 

AATA G " GC 'l" I VX' r c: i ' >^ iTl - VI , «, « ^S, „ 1 - ^ I 1 Vl- *GSGGA:tACAGTCTC.jw * 
CTCTCGCclGCTrrGC^GTGCAA^ 

r^CAGGGGTGAGCCAu rSTTCCTC-GGCCTC 



'•1 32' 



36~: sscvsrrcs ;s sc. 73 



rrceAc 7 TGAGcrrrr^—GCA? ccAcrrtAcrcc? AGertGCGCAA- cu^Ar cagaca 

CTCGCT? CIVACAC^AA^CAAAAACV^^ 

aaataaataatagcac^gttgatatagg— Ar^AAAArrArAjuu^rsGGATArrJur 

AT — AAr C7~CCCACCCATCAC;r7ArrC7AAArAAT^. Z > :GGTGGAAATTATTGTAC 
Atw> ^ . * AAAATCTGTGTAA * ^CAGSSAA tt ^ - I J A AAACCrArAACGTTGCTG 

T^ACTAcArTACTgrrwCAcrccTGAr~3GAA, : /r uw r^r sqrasGAAr satttcca 
rrcACTGCAAAcxr:ccAcrrc^crccAGC\c^cArArrrcA 

7GGC -CTGG7GTTTATCAAGTACC? CCCTGAACGGACTGGGTGGCTCACCTrGGCTGrGA 
7TTCAGTA7 



3 £72 ID NO. 74 

aactacaggtgtgcaccaccatgczcggctaa' X ' i : i : g7 a :i i' t ci' V a A gatacgagott 

TrGCCArGTTGCCCAGGCrGGTCrrCAACrcrGGGCr^ 

TCCCAAASTGCTAAGATTACAKCATGAGCTACCATK 

AAACT7ACTATAGC7AATTAATCATTTACTCAAGAGTTA7 

TTCTCTAAArCAAGATAAAGAGArGAGGAAAGAAAAC\CrCCXG7 

CAAACAAATTATCAC7CrrGGGTTTTAC7 AXA? ACTGAAATCACAGCtrUGAZGACCCAC 

GCAGTCCATTCAGGGAGG7ACTTGATAAA 



3571 SCtSTCS 12 NO. 7S 

^CTT^CGTTCCCGACCCGAGCCrSGTGC^^ 
GGCKCATCCGGATGCXCGCSTrtXAGGCCA^ 

CGCTGArCiSTCACGGCGArrTAtCCCGCCrCGGCSAGaa 

TTGTAwGCGCCGCCCIATACi. ; r U TCl' UC CTCCCCCGCGTTGCGtCGwGrGCArGGAGC 
CGG^CCACtrrCGACCTGAATGGAAJICCGGCGGCACCTCGC?AACGGAriCACCACTGCAA 
GAATTGGMCCAATCAATTCrrGCSClAaAACTGTG 

ACArATCCATCGCGTCCGCCAXCrCCAKCAGCCGCVCGCGGCGCAtCir^ 
GG7CC7GCAG 



3 €74 S2QCESCS 12 NO. 7fi 

CTCCAGTGTTTAAAAAATAAAAl^AAC^^ 
TTGTAAACACVrG7ACAAGCCATATAATAGAGTTCAT^ 
CTAGAAAGTCr^CACCCGGCCAAGATAACACArCr^ 
TTATSGGTrGTTTACTTAAATCArAGTTT^ 

TcecAccAcrrarscGCcrsAGgagcc^ 

CCTGGGwlATaTGGCAAAACCTCATCTCCACTAAAAATXCV^ 

GGTCCACACATCrrrAArrCCCAGCTACrrGGGAGG^^ 

CTAGGAGGSAAGAAGTTGNAGCGANCTTAArS? 



3 723 SZZZcZCZ ZZ . 77 



CACTCV^TTCTGAATGCTGCCATCXTGATCAGCGTC^ 
CTSCTrCTCnATAAATACASCTCrATAA^ 

aaaaat :^ ■ :rr stcr?r^ACAtxun , raAr :rr r A G i : . r i: r ccTCCTCACT57SGA 

ACAtTCAAAAAATACAAAAAGCAAGC^GGT^^ 

GGCAGGAG«ccGcr:GGGCccAGGAGT:c;c^^ 

TGC~C7A~AAAGAAAACAAAAAACAAArAr^ 
7AIGT-\TG;uUAAArrAG7GTAAA 



3735 3HCCS2IC3 ID 30. 73 
CCTSTATTnCACrGAACCACCAGGA^^ 

GGCAc^TrcAGAArr3AGrGOGGGC7— ctggcccacagtctcggta-cttctgtgaa 

TGGMT^TAarrcrAC^rAAAAC^ 

cACcx^ccAc\rriccAccrccAAc^CAGAA/r:':u : - ;■ iaact caattcgnacct 

TATAAG? CACTTTTCC CCAACTCACCAAC7 CTAGCtAAGAArrTTTAACC'TCAGAAAAAC 
AGCrACACrCTAAAATT^. rCAA^GAAAATGTC7AACATATGGAAAGAAGGACT7AACA 
TGTOAAGCAGACACrGGCr CCArCrAGTGGGtGCTrTATATTGAAATAArrA?AArACGT 
CA7CAAAT7TT7niGGG7AC\G2rrTA~ 



373 7 ZD 2TC. 73 

cArrACATAAr3G»ra<^ 
TArratCArrcrAraAccAaAssuAA^ 

AAC^AACSSTTACACrrCOTAt^CXAC^ 
AAGCAAGSAAAAACAArXACAGffi^^ 

qrrcAcrsAAg5c\?rcaGcrrctrcr L a ' : u ccisggt ^ ^ * - GcrccarrcrcTc 

TGTT^rCCCAACArA«CAATTGrACr^ 

crGCArGAAAAcrrGrrrrAc^GGc^ 
7G7»rrrcAiaccuaTAT^^ 

C C T't G TGTCAGAGCCG 



3735 10 NO. 80 



AGGA7C 

TATrAAAGAAAACAAAAAACAMIATTGGAAGTATT^ 
ATGAAAAAATiAGfrG7AAAATA7ATATATTATGATTAG^ 

TATGT7AT7 ?7GGGATTrCAATGC V:\ ; : rA COCCATrGTCTCAAAAAATAAAAGCAGAA 
AACAAAAAAAt^TCTAACTSAAAAATA^^ 
TCSrrTGTtXGTTTGKTTGTTC 



TTT 

AAGTGCAG 



gaa~g c: : : . : t » , . : — . « . t . . « . ^ rcrrrrAArGTrrrrArrGTyccrrrAGA 

TG-IGGCCGTArGACrrTC—GTCC^—C^A^^ 

crATcrCTccarrACAT^rrrAG^^ 

TAGCAACATt^UACTTGGACTATGTtrrCTAr;^^ 

TTAGAAACAAAAACGA7 C^C77ATTAA7GGAAAC7AC33GACTGGATrTACAACAAXCA 
TCSCArSXfcCTKACXrACSAACTTAC^ 

T2?*niT CTrACGGGCTACGtf GAAT7 wVAACAArG7GGGGA^CGAACTTGA2rrG7ACAAAn 
CCTGACCA-C3jrrrCACJ7AC 



3744 SSQCI2IC3 13 NO. fl2 

gaa~ cctt: act c: ^ :^ AA?rr:Acc37crrrsgccirACArcrcAtTrs^^^ 

AGAAGCTT'CTGACAGNAGGGCrGACAGCACCGATTCAiAACAC^^ 
AGACTTAAGACC^ACAAr^TGGGAC^ 
G*AGAAAAAGACrGCTrGTGTGGGAGAGAACUUrGAG»^ 
(^GMOTACWCTAZCCTACAAA^ 

ACATGA7AGAGAArrGArGAGAAAATAGCTG3C I' J I CGAAAATTTACrGAATTTGGG 
AAGGTGACGTTAAAACTTTTAGGAT!7AAG CAACTGAGC7TCAAGACTT^rcS , rCTt?GGGAAG 
GAA7 GGAAACACAGACGGGAATCACnrT GA 



374S 3Zqgz::c3: c kq. 33 

rGAGGATAGGCArGACAACAATGACAAACTAGGA 
rTCACrrcraAGTGGGACLV^TrACGCAAGGCACCCCTCr^ 
C7GG7TG77 CTtACACGA^tAAAAACtAGCG- CCAG^TGAXGCGCA^^GAGGAA^GG 
CGGGGTCGAGG~GGGA7GGrGGAtGACA^GGCrcrCGGCCGrGAAGC2^Ar^ 

ACrTCAT3AAC\c:-GACUTTXACCrCC»^ 

<CTA6ST35Tcr3caAC5 ^ TT c , :^ ^ ' cr^ crrcAACUcTTCcagAgccACAscArcArcA 

GGTrrGGAGCrGAAGAMAGAGAACATCCrc 



3743 13 SC. 34 

GGATGGG7GCC C TT I 7AGACCArACAAGGTAAC77GCGGACGTTGCCArGGCATCrGTA 
AACTGTCATGGTGTTOGGGCGGAGTGT C : - rJ T A GCATGCTAAXGTA~ATAATTAGCGTA 
TAGTGAGGAGTGAGCATAACGACAGGTCACTCTCGTCACCAXwi" XjG I 'rTTGGTGGGTTT 
TGGGGAGCrrCTrrArrGCAACCAGrrrrATCAC^^GATCrTTA^ 
GCTGACTTCCtATCTCArcCCGTAAC7AAGAGTACG7AACCrCCrGCAAAT?GC^ 
GTAGGTGrTGGNCrTATTTTACGGAGCCCCrATTCAAGAXAGAGTrGCrc 

GTGGTCTGAAAXCACAGAAAGCTGAArrrGGAAAAAGG' itiCI TGGAGCTGCAGCCAGTAA 
ACAAGTrrrCATGCAGGTGT 



3"c zzz~;cz zz :ic. 35 

CT 2CAGT3AGC ZLAAAATCCTGCCACTGCAC^ T w*CTCCAG*- CTGGGTCA CAGGGCSAGC C 
CCTGOrT' CAACAAACAAA CAAACAAACAAAAAC— CACTTAG A. • • GTCCTATTA TACtJCA-A 
AT - *Ai. « • • • CAGTTACAA i • - * i 4 d««v.<««G\.v» * -A±* -vj- jiGAGACAAIvjGC 
CTAAAnAGGCArr^AAAT? CCA^AATAACArAAATTATCVCTAAATCTT^rAACTAATC 

ArjuTAtArArA rrrrAC A cpAV. : . 7 TCAraAar^rAGArrccArgcArATAAAAr^ 

v. *CCAATAJ'. Ivj * r V - '* * - * J'i J zAATAGAGCCACGGT^ •GCTACa . .GCCCAAG 
CTCCTT^T^AACTCCTCGGCCCAAGC^ATCCTCCTwCC^CAGCCTCTSACCCTGCCATTA 
CAC^T^CACCTGuCrTC C^ r i'T~?T ; rTlT^a AI^CCAC^GAGCaGGAAGaAAA 



2751 szccsics tz no. as 

CrGC\GATGAGAGGCACT^~^ 

GCCCAC37GO "I" 1' U i U * ^C^GAArCTGGTT CTAX AACAACSTTCCTAACAAGCrGT AGCC 
;LkAAAAArrT^?C^^?ATTATAATTArT7 

GTGTCT^--"*CACAr«rf 1 k lAAGTO*-*. *Ui J l ~-uuA7AT**vI AGACVi V- * ,CAAGCAA . 
CTTAGAGTGTAGCTGTTTTT CTCAGGTTAAAriATrcrTAGCT AGGA7TGGT GAG7TGGGG 
AAAAGTGACrrATJUGArACOAArrGAArTAAGAAA^A^ 
7AA-G"GG7GG7GATCTrCATrAACAC7SANC7^ 
GAA7C7A7ACCC£A77CA2f AGAAGA7AC 



3 752 52CtS;G: 13 SO. 3 7 
<nSC^CCAG7iAACXA<TT^ 

TACTTCtrTGTACAi - 1 * ±1 w* ^Jc 1 . iCAGwi^ i * *AGAAC7CA7aG7GACKG7C7G 
T7GC7AA7CC£AGG7C7AAC:^n7ACC773^^ 

er r : :" : i r - iiiu ^-asatsaga^ 

GAKAAGAT»<K3AAGAASA^^ 
Al'vaC'C C7 T7GGCC77G7GA7 "X C 



3733 SSwCSTCS ID NO, 33 

ctgcagtgttc c ' n u 't c7ccac77aaaaca7gakgta&7aacccc7cgsrrc7c7caac7g 

ct7caagg7«tsa7(^caxsctg^ 
oaaactca7acgtacagc7scc^ 

GycATA^^:cAATgxcAQ^^cc^cAT^ , . :u* r cereATcrr c -aTcaccAccrGCcaGga, 

GTTACC? 7GCC7CG7CCAT7AGA7AATGCG7CAGGGTGGCCAAGGC7CCG7C7G7C3TTG 
TtX7CCTGCCG7TCTC7ATrG7CA77C7A7AAGCACAAGAAAAACA7T7? CAC7AAATCA 
GAT7 C7CAGCAGAATCAAG 



136 



o 



o 



3T54 szz*^:zz zi jrc. af 
7AAc:crcAccsrrcxACAr7 rcr? ccrcc TrrAGC cTccr^AGTAGCTGwCAcrAr^GGT 

ACGTC CCIACTA- * CCTGAAAACACAAC CAGTTTTGAAGGTAGTGTCTGGGCCGGGCG CAG 

tggz rr cAw gc«j j' " — ^at c — cag ca u • * i cggagcztcgag^ r~jGCCGGATCAC.» • gagg T c 
AGc^~~AGACCAGcrr»ccx;c;r^CArAAGAcrccATcrcrA^ 

AAAATTAGC^GGCArGGTGCNGCVTGCrXrGT^ 

CGA G ArwTTGGTTGCAA CCTAGGAAGCAGAGGCTGTGGTGG AGCCGAGACCGGAC CATTGG 
ACTC ^CGCTGGGi^AACAAGAGTGAAAA— 

rrrrGNCGGGNGc;GGC^G7CACGc:r^AATCG:r^ 

GTCA^ZA^HHAGuAGTCCG 



3725 13 2T0. 50 

GAACCCCGCTGACATGTCCTATGTTCTTT'T CTC C GCTACTCCTTCCTACTGCCAC57 AATG 
AAGGG7AGGGC7CCaCC£7GGACCC7GAAG7AAGCTAGAGGTTAGAAGCT 
GAAC-GACAT7GAGTCC7T? GATGAACG T3AA GCCACCG7AC7AATCTGGACrGC CrACC T 
CTGCACTACTCTA^GAGAGAGA^AGTArGTGCATTATTTAAACCAGTTGGGTTGATTTTC 
TATrAACAA^CAGAAACATCTCTGTAAAA^ 

GT CAT ATGGTCT CGAGGGCAAACAC7CAACTG TGCTACTGCAGTGTGAAAG CAGGCACAG 
A(T^T3TArrAACCAAGGAGC^TGGTC\CT^ 
AZ^CTTGGTA TTACA C^I^GCwAXGGTAGGAGAAGATCrT 
CAATGTrGGT u - 1' 1" IAIACGNG 



3737 sz-rjz::cz 12 sc. 31 

GAArrCrCTCrr^\GAAGrrGCATACACAACACA 

TGT7CTCACACCCATCCZAAAXA CAACGGAGT CAGAAGTAAAG7CTG*GZrTGGCTGG GAAT 
ATTGGCACCTGGAA7AAAAAT U C T T1 ' , CTGTGAATGAGAAACAAGSGGAaGarGGATATG 

AAGC CA7A7TACCTTTCTTC"rGACAACGACTTG"rCAGCC CACGTt*Gl'l"^w.^;a aGG CAG AA 

TCTGG^CTATAACAAGTrCCTJUI^^ 

AT7ATTTCAAIA7AAA(K£CGCACrAGATGGAGCCAC7G7C?G 

T7Crr?CCAXATGTTAGACATT7?CTn^ 7 CTCA 

GGTTAAAA7TTC7TAG? TAG 



332.0 SSQGS^Ci 10 NO, 33 

TA?e^r:GCcrATT c r ::r; c^ 

CrTCTAGAGATAAGTTAArrTTTAG 4 * ^'r ui ^CCTCCTCACTGTGGAACAITCAAAAAAT 

AaVAAAAGGAAGCCAGGTGC^GTGtAATGCCAGGCTCAGAGG^ 

CT7GGGCCCA<K^GTTCACAAGCAGC7TGGGCAAC^ 

GAAAACAAAAAACAAATArrGGAACnArrTTATATG^ 

AATTAGTGTAAAATATArATArTATGATTA^^ 

TTTTGGGAmCAATGC C ' I ' L ' i VTA GGCCATr^CTCAAAAAAAlAAAAOCACGAAAACAA 

AAAAAGTTGTAACTTGAAAA^^ 

GGTHGGTTGGGTTGGTTGGT 



137 



333 3 13 KC. 34 



CArrrrrr: 



crrj'OTtKAGAoiKGrcTcactrraTCjCCcaGAcr^ 



gcacgazctcagc^cac^cjuicctc^ 
TafiTaGagAQcqsTTTcscc xTa ' r r j 



GAT CCCCCC!7CCTTCrtwTACTC CCCATCXj 



33 22 13 SC. 35 

SSTGCTCCrrTCTAGAACTaGTGGacr:^^ 

AATGAAANGTCT CCCA wTCTA L .J CTACACAGACACSGCArCCATCCSTTTTTCT 

c^cTrrc~:ccAc:rrrrcccG7:r^ 
Grsnsrccaictn^^cc^G^c^ 



3333 'SSCuOCZ ID 2JC, 3$ 

GG?CAAGGGAG^:3UG\C*TGGT^ 
QRTCTCTGCTTCACCAGTGTCQGC^crn^C^ 

GrrrArcsrcAAGc^TArrGAtrr^^ 



3836 SSCCS3CZ 13 SC. 5? 

AAsgracsGArrccrrTAGSTAGarr^^ 

GA< aG7GTTA<^TCrAAT rcrATATCACA7GTAA u i. Vj T ATTIGGATATJlTwlfflAAZA 

gtgu 1 : iv; i ri^'i't i"::'! ; : r * m -inn ^ggngamagagtctcgctctgt 

CGCCAGGTTGGAGTGKAATGGTGCGArc 



V38" 



0 



* * ^75£i^j^ T?* wC:;: " i * ^** : ^^~^^^~^£^~gtcggcjaag^^ 

AC^SCGCCATSCGCTGGAAGAGACTCCGGCG~^ 

AC-CCC3ACC37GAAGG - , « G GAGGA TGGCCCGAAAGT^ «IU . ^AA&GAGATTGGCACTA 

AATACTG^GCAcrjcrGTrGAAz^TrrG^ 

^tgttsw * cAcciATArrGczcAGriArrOTCGGArAACcrsAArTArc 



3343 SEQCHSCS ID SC. 3? 
TAATTATATTSAAACGCTTCT^CCTAGGTC^ 

ATT^GsnGcrcrrrrr^cArscAiT^^ 

TCTTrCAGAATrAACrACC^GTGC^ 

GA7SAATTACTC7GAAGT77TAAZ7GT^CC^^ 

ACTAGSTCACTATTACrAArrC^^ 

C7GAAAC5tAACTTAAGAC7ACAG7TAATrcrAAGC lJ lT l' GG GGAAGGAZTAZATAGC CTT 
CTAGTAGGAAGTCrrGTGCTArCAGAAIV; 1' '.^AAAGAAACGGT^rTCAAGGAATJIGTAr 
AAA^IACCAAAAATAATTGAT 



388* 13 :iG. 130 

AAAAC\AAGCC7wTTGAGG77C7GAAAAC£GAA 
T7A7AC7Gr:A?A<UUAA»^ 

ATGTtTTTCTCCTCCAGGCCAAGCrGTCrAAGGACrGC^^ 
ICGCAGATTAC^TCTGAAATAGGArrrCACCAGGTCArC^ 

AAArrorrrA^CAArcAAArAcr^rAACAcc^ 
rrrGTc^cGacrrrcucT^c^ 

CXATAAGTAOIAACCGCZTGAGACUCCAAA^ 
C^C7T7C7GATTTtrCCAjSCAAAAG3GGGG 



3335 ID OT. 101 

GGATCCGCCCTCC^ 

gc; :?;r A : r : , . * i\\ . i^'r^Gc^GAOGAGTCTrAcrcrG77G<:ccAAGCTGCAGr 

GCAGTGGTGCAATCTTGGTTCACtGrUACCTCCACCTCX^^ 
CTC^TTj^GGA^ 

AT VI TT ^TrAGTAGAGACAG G a i 'ITCACCArGTrGGCCAGg C r GO : ^ GAACTCCTSA 

CCTTGTCAXCCACCAGC^CSGCra 

AACCAGCCTAAAGTTTTAAAACATGCCAAG^ 

ACATA 



1 7Q 



33 = 7 SZZZIZZZ SC. IZ2 

gcat: 

gctgagtctc 
acttc:c:sga— : 

<K3GC\CTG7GArAAC<^7AAG^T^ 

crrraaGAACArrrrcarrT^^ 

QrTATCrnJGCGuCTGAAAAATGTTr 




GAAAAGGGAAAC^VWUCAGAACTTT^ 

CCArAGArTAC^r^AGOGAAACAAAC L ' : H ' U A TCAtGrsr^CCrCCASSCCAASC 
TGr^AAGC^CCGCaAAC^^ 



GCAGTGC7TCCAArrr:GC::?CAC7SCV^ 

CrCAGTTrcrC^GTAC^GwAT^aCGTGCCTG 

A?T7T7T^AG7AGAC^CU3GGTT::^ 

crrscnsjuc^ccAGGcrcsG^ 

CACVtCC:^^C7AAAG7777^ 
TATrGTACAATTAATTTTA'T 



3331 SXCtSrC 12 SO. ICS 

GTcrrTcccArr^crcrAC^A o c tt u t C-crrrACAr^rTACTcerrGCCArTrTCAA 

GAAAGCATTGTGAGCTCTTCCAATCrCCirCACCrrTGGGCT^ 

AGArTAT tl " ' IS J A CAGC C ' .T T ' lA TGwCCA\TTAGC\TTCCArc^ArtTTA'rA'tCrAGCA 
tArrTGCGG STrAGA ATCC CATGGAT G T ' J ' 1 ' C T T CITTGACrATAACAAAATCrG GGGAG GA 
CAAAGS72A!TTrrCC?STG7CCXCATC7AACAAAG~ 
AGGTrCCTrCCAAGtCTTCCTGACCaCCrTGCACtArTGGAC^ 
&CAGAAAAC3A T:". ru aPACATA^CATCGCAGC^GC^C^GTGTCCCCC^TGGCA^ 



3sca sscuekcs ID *io. 10$ 

C^TCCr^CCGCCTTKCCTCCCAAAGTGCTGGGAXTACAGGCA^ 
GCTSAGTCTSCGATTtCTTGCC^GCTCTACCCAGTTG^ 
ACTTCTCTCCE^CCCrTCrCCT^AGTAAAAr^^ 
CXSGIArrGHGATAAGGAT 



1 40 



o 



2 5 S3 13 VO. 107 



^777AC7CCAGCCTCSGCAACAAAAr3AGACCCTGw 



AATAATASCACAGTr^rATAO^^ 




ctmaaa«t:^ac7^x»ctc:^ 
crc^7GTT7ArcAAAG7Ac:rcrc^^ 

QTArArCCnAAAACCCAACA 



3H4 SSCUSNCI ID NO. 103 



TCrCXVAGTaCIAAGAITACAGGCXr 



TCCTAAGTAGCTGG 

_ TgTACTTrCTGTAG ATACGA GGTN 



cracAGcrrraAcrrccr^GGATtA^rca?irc 

AACrACAGGTStGC\CCXCC;rGCCCuGCT3A 

tsgccatgt: 



^rr^ACTGAAC^TrArGGGATGTACACGT: 



'STMA 



ACSCAGTCCArriCXGC^KTACTKArAAC^CCA^ 
AGArA?0C7GCTGGAG7CSA 



SAGGGACTAACCATAATG 



A* 



3373 SSCT^fCZ NO. ICS 

m:arTnTGTcrcTGCCGcrrAAArA77AAr: 



rTATACAITTA 



riAAGGCAG 



GGACcrcAcrnMracrrzr^^ 

A 



3373 SICUSJCZ ID NO* 110 
TSAATGAGAGAAGTCAGTAAA^A^ 

racccGTracrACAATA^ 

T 



3 530 12 NO. Ill 



? 41 



3332 SZ^ZZilCZ IS JJC. 113 



C^7?IA2JG7CrG0rrG5J7Trr^ 

A&^=:rrACttrHGcssA»c3rsAcrcrcr:aTO 



3 233 ID JTG. 1X4 

Arrccesc3TAScc:^rAAc^TAC£<^ 
Gcr:^AG7SvK::rrrrcACccT^ 

AA«CAflCCCA^rG7ACCArAACr~wWAAAArrAAAAAAAAj^ 
T 



3334 112 SC. IIS 

AcrcrrcAcrrGCCA»r^aAccrrAAGCAAjE7^3AAcrTC^ 



3536 ScQCSTCS ID SO. 115 
CGgCgyTS re^ CGCrCTAGAACT^ 

CC*T I^fGV^I ^ VAAAAAGXATTAGAArC7C Aj.li *T ^ CT GAAGAAGGTTGGCAGTGGGTrG 

GGAGGAGGGATTGGAGATTSATCjGaTAGGAATGTGAAGGGAT^ 

A 



3397 13 NO. 117 

TGGGXACCSC^CCCCC=TCSAiSrrC»C3CnMC»^ 

TTGCTC7TCITCCACCCCCTCGTrGGAAGTGTTCCrAA c»I G ITjl ' G GCnGgGCTC C ' rC 7 7 
C 



0 



333a SZQ'SzSCZ 13 XC. 113 

CTT>j . G w^GGGCGAC^G^rTCrGGGCGAAGTCtHjCACGCCTC^ 

G ACGG CCT , - GA"s. « * - C • ^ * . GGTCCwACTGTCTCGAGGCATGCATGTCCAGTGAC 

7CT75TG. • TGCTuCT JCTTCCCTCTCAGA.TTCrr CTCACZGTTGTGGTCAGC7C7GCT7 



1533 52C02&C2 ID SC. 113 
ATTCGTCGTAJTCGCG^TACrATAO 

GwGGGCSC^^ICG^i. * GGGGCwC^AACCCLoTtj7GC^AAACC^GGAGGC^CGGCCCS 

TTTCTGGGGCTTCGGGCGCGGCCGGGTGGACACIAGA 

T 



333C ID NO. 120 

TCGAGGTGGACwTAXCGArAAGCTTGArATCGAArTC ZTGCAGCCGGGGGGAT CCGCCC 
CGCGGCCTCCr^AAGTGCTGGGAT^ACAGGCGTGAGCCACCGCCCGGGOT 
ATTrCTATTSGCrAGCSCTSCTCTAAATCTr CTGTT^ 
C 



874-3570 SSQUSTCi 13 NO. 12Z. 

AAAgrC^TGGMTTCCTTTAGGTAGCrAC^rrATCA ^^ ' l ' 1 l^'l ti AGAATAAAATGrtATTG 
AGAG TGTTAOGTCTAArTCTAt rATGACArGTAA u ' l n JA ' lVl ' CK SAZAXATCAgTAATA^ 

TGcr rrrr u ^^TTritiGGGGA^GacrrcTCGCTcrGTc 

GC^GGrrGGAGTGCAATGGTGCGATCTrK 

AGTGATTC 7CCTG CG TCAGCCy CCr^cyrAGN7GGGACrAQGGGGXGCG^ 
TCGGATAATTTTGGG^rrrrrrAGTACA^ 
rTGGAACTC CIGAgA TCATGATCTGCGTGCGTtJ^ 
GGGTGAGCCACTGTTCCC 



143 



o 



835 -33 £,555 SICCiZTCX ZZ :IC . 122 

CC7TAA* CAAAAC. - . GA7 » ACACAGTCCL - - *'AAGGCAlj*Vv--v** - * . AACCCCAGG 
TKGTTAAArA~CCAGCTA~C7^GGAGCT7TT^^ 

tctctacc ctg^ ^caca— AgAAtc^^ 

TSAGATArTCTTACTCAATrrA^^ 

tttctgt^ggaac:^ 

GTAA GOTT GG T^ ± AC AGCK^ ATAAACAgArCCrrCCTTAGNCCC7gcCAC7TAArCACr 
GAGAGTTTGCGTGG^GG l"! * JOJATTTAATGAC 



GACACAC ATTCACACATAATTATGAAAGCATTTTCAGGCAAAACrCAA^ CACAAG7C7GG 

GTTTTTAACATAGTTAACrGAATA:^ 

ArSt^TCTGGAAGAAGAGCTAwAAAAAAAC^^ 

TTArw2JAC\CATTGTrArTTTATC"CrrAAT>JCrAGtAAAG 

GTAAJaACTA CTrCG AAAAAAirTTA^ 

ilin >* mm^*m 



aaS.iaT SICCZtfCZ 12 MC. 124 

GCTCATCACGCTT CACGGGGGAGGC7GT CCGGGAAGAAr GCT CCCACACAG?? ArAAAGAA 

TGCTCCGGCACAGGATAGAGAA-CCGCCGGOCAGCATAGAGAAGCCCCC^^ 

GAGA ACGCCCC CICACAGCArAGAGAAgCCCCCGQ^^ 

CTGGw* 4. - - *.^CCAGCCAAACTAAAATCACAGAGG3C^CACATCATrTAAGArAGAAA 

TTrcTGTArrrrrrAArrrrrrroju^^ 

ACTAC^ATTAAG^rA^ 
GCTAGNjr'rTlTlTrCCAGST^^^ 

CCGCGAGG GAAAT ATTCAGTTAACrArGTrA AAAAC CCAGA C L I ' UlU ATTSa G^ ' Tig CC 
TGAAAATGCTTT^ATAA t . ATGTG73AATGTGTG7C 



a34.515.C2J 32QC3HC3 ID KG. US 

GIArAATGCACGTaCrATAAGGTCAGCATGAGACACAGATCr ? T U C r T'i'CCACCSTC'rTC 
TTCrrZArGGTTSGGTArTCTTGTCA^ 




849 5 ' -atc-tccggcaggcatatci:-3 ' ^e^ence l£> Mo 1 *^ 

892 5 ' -tgaaatcacagccaagatgag-3 ' 'SecjO^CS 1^ No'. 

885 5 ' -tggagactggaacacaac-3 ' £ecjv«y"»ce IT) No- t^.S 

893 5'-g"tgtggccagggtagagaact-3' Se^uc^ce ^ No* \ZFt 

890 5 '"-ccatagcctGtttcgtagc-3 ' SeCJUO^e- ^ ^ 

891 5 ' -ccatagcctAtttcgtagc-3 ' Sequence- l Q ^ ' ^ 



- US'! 



SEQUENCE ID NO* 132 



FILE NAKE; ARMP.UPD 



TGGGACAGGCAGCTCCGGGGTCCGCGGTTTCACATCGGAAACAAAACAGCGGCTGGTCTGGAAGG 

AACCTGAGCTACGAGCCGCGGCGGCAGCGGGGCGGCGGGGAAGCGTATACCTAATCTGGGAGCCT 

GCAAGTGACAACAGCCTTTGCGGTCCTTAGACAGCTTGGCCTGGAGGAGAACACATGAAAGAAAG 

AACCTCAAGAGGCTTTGTTTTCTGTGAAACAGTATTTCTATACAGTTGCTCCAA 

CCTGCACCGTTGTCCTACTTCCAGAATGCACAGATGTCTGAGGACAACCACCTGAGCAATACTGT 

ACGTAGCCAGAATGACAATAGAGAACGGCAGGAGCACAACGACAGACGGAGCCTTGGCCACCCTG 

AGCCATTATCTAATGGACGACCCCAGGGTAACTCCCGGCAGGTGGTGGAGCAAGATGAGGAAGAA 

GATGAGGAGCTGACATTGAAATATGGCGCCAAGCATGTGATCATGCTCTTTGTCCCTGTGACTCT 

CTGCATGGTGGTGGTCGTGGCTACCATTAAGTCAGTC\GCTTTTATACCCGGAAGGATGGGCAGC 

TAATCTATACCCCATTCACAGAAGATACCGAGACTGTGGGCCAGAGAGCCCTGCACTCAATTCTG 

AATGCTGCCATCATGATCAGTGTCATTGTTGTC\TGACTATCCTCCTGGTGGTTCTGTATAAATA 

CAGGTGCTATAAGGTCATCCATGCCTGGCTTATTATATCATCTCTATTGTTGCTGTTCrTTTTTT 

CATTCATTTACTTGGGGGAAGTGTTTAAAACCTATAACGTTGCTGTGGACTACATTACTGTTGCA 

CTCCTGATCTGGAATTTTGGTGTGGTGGGAATGATTTC CATTCACTGG AAAGGT CCACTTCGACT 

CCAGCAGGCATATCTCATTATGATTAGTGCCCTCATGGCCCTGGTGTTTATCAAGTACCTCCCTG 

AATGGACTGCGTGGCTCATCTTGGCTGTGATTTCAGTATATGATTTAGTGGCTGTTTTGTGTCCG 

AAAGGTCCACTTCGTATGCTGGTTGAAACAGCTCAGGAGAGAAATGAAACGCTTTTTCCAGCTCT 

CATTTACTCCTCAACAATGGTGTGGTTGGTGAJ^TATGGCAGAAGGAGACCCGGAAGCTCAAAGG^ 

GAGTATCCAAAAATTCCAkGTATAATGC\GAAAGCACAGAA^ 

GAGAATGATGATGGCGGGTTCAGTGAGGAATGGGAAGCCC\GAGGGACAGTCATCTAGGGCCTC\ 

TCGCTCTACACCTGAGTCACGAGCTGCrGTCC^GGAACT^ 

ACCC\GAGGAAAGGGGAGTAAAACTTGGATTGGGAGATT^ 

AAJlGCCTCAGC\aCAGCCAGTGGAGACTGGAJ^CAC^CCATAGCCT 

TGGTTTGTGCCTTACATTATTACTCCTTGCCATTTTCAAGAAAGCATTGCCAGCT 

CCATCACCTTTGGGCTTGTTTTCTACTTTGC 

TTAGCATTCCATCAATTTTATATCTAGCATATTTGCGGTTAGAATCCCATGGATGTTTCTTCTTT 

GACTATAACCAAATCTGGGGAGGACAAAGGTGATTTTCCTGTGTCCACA 

TTCCCGGCTGGACTTTTGCAGCTTCCTTCCAAGTCT^ 

GAAGGAGGTGCCTATAGAAAACGATTTTGAACATACTTCATCGCAGTGGACTGTGTCCCTCGGTG 
CAGAAACTACCAGATTTGAGGGACGAGGTCAAGGAGATATGATAGGCCCGGAAGTTGCTGTGCCC 
CATCAGCAGCTTGACGCGTGGTCACAGGACGATTTCACTGACACTGCGAACTCTCAGGACTACCG 
GTTACCAAGAGGTTAGGTGAAGTGGTTTAAACCAAACGGAACTCTTCATCTTAAACTACACGTTG 
AAAATCAACCCAATAATTCTGTATTAACTGAAT^ 

GCAGGCACCAGCAGCAGAATGGGGAATGGAGAGGTGGGCAGGGGTTCCAGCTTCCCTTTGATTTT 
TTGCTGCAGACTCATCCTTTTTAAATGAGACTTGT^^ 

GTAGATTGCCTTTGGCAATTCTTCTTCTCAAGCACTGACACTCATTACCGTCTGTGATTGCCATT 
TCTTCCCAAGGCCAGTCTGAACCTGAGGTTGCTTTATCCTAAAAGTTTTAACCTCAGGTTCCAAA 
TTCAGTAAATTTTGGAAACAGTACAGCTATTTCTCATCAATTCTCTATCATGTTGAAGTCAAATT 
TGGATTTTCCACCAAATTCTGAATTTGTAGACATACTTGTACGCTCACTTGCCCCCAGATGCCTC 
CTCTGTCCTCATTCTTCTCTCCCACACAAGCAGTCTTTTTCTACAGCCAGTAAGGCAGCTCTGTC 
RTGGTAGCAGATGGTCCCATTATTCTAGGGTCTTACTCTTTGTATGATGAAAAGAATGTGTTATG 
AATCGGTGCTGTCAGCCCTGCTGTCAGACCTTCTTCCACAGCAAATGAGATGTATGCCCAAAGCG 
GTAGAATTAAAGAAGAGTAAAATGGCTGTTGAAGCAAAAAAAAAAAAAAAAAA^ 
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n 



SEQUENCE ID HO. 133 



FILE NAME: ARMP . PRO 



MTELPAFL^YFQNAQMSEDNHI^NTVRSQNDNRZRQEHNDRRSL^^ 

DEEEDEELTLKYGAKHVIMLFVPVTLCJIVVWATIKSVS 

HSILNAAIKISVIVVOTII^WLY 

ITVALLIWNFGWGMISIHWKGPLRLQQAYLIM^ 

VLCPKGPLI^VETAQERNETLFPALIYSSTM^ 

DTVAZNDOGGFSEEWZAQRDSHLGPHRSTPESRAAVQELS^ 

VLVGXASATASGOWNTTIACFVAILIGLCLTIJ^^ 

FMDQLAFHQFYI* 



O ' O 



SEQUENCE ID HO. 134 
FILE NAME: KAfttlP.UPD 



ACCANACANCGGCAGCTGAGGCGGAAACCTAGGCTGCGAGCCGGCCGCCCGGGCGCGGAGAGAGA 

AGGAACCAACACAAGACAGCAGCCCTTCGAGGTCTTTAGGCAGCTTGGAGGAGAACACATGAGAG 

AAAGAATCCCAAGAGGTTTTGTTTTCTTTGAGAAGGTATTTCTGTCCAGCTGCTCCAATGACAGA 

GATACCTGCACCTTTGTCCTACTTCCAGAATGCCCAGATGTCTGAGGACAGCCACTCCAGCAGCG 

CCATCCGGAGCCAGAATGAC\GCCAAGAACGGCAGCAGCAGCATGACAGGCAGAGACTTGACAAC 

CCTGAGCCAATATCTAATGGGCGGCCCCAGAGTAACTCAAGACAGGTGGTGGAACAAGATGAGGA 

GGAAGACGAAGAGCTGACATTGAAATATGGAGCCAAGCATGTCATCATGCTCTTTGTCCCCGTGA 

CCCTCTGCATGGTCGTCGTCGTGGCCACCATCAAATCAGTCAGCTTCTATACCCGGAAGGACGGT 

CAGCTAATCTACACCCCATTCACAGAAGACACTGAGACTGTAGGCCAAAGAGCCCTG<^ 

CCTGAATGCGGCCATCATGATC^GTGTCATTGTCATTATGACCATCCTCCTGGTGGTCCTGTATA 

AATACAGGTGCTACAAGGTCATCCACGCCTGGCrTATTATTTC\TCTCTGTTGTTGCTGTTCTTT 

TTTTCGTTCATTTACTTAGGGGAAGTATTTAAGACCTACAATGTCGCCGTGGACTACGTTACAGT 

AGCACTCCTAATCTGGAATTTTGGTGTGGTCGGGATGATTGCCATCCACTGGAAAGGCCCCCTTC 

GACTGCAGCAGGCGTATCTCATTATGATCAGTGCCCTCATGGCCCTGGTATTTATCAAGTACCTC 

CCCGAATGGACCGCATGGCTCATCTTGGCTGTGATTTCAGTATATGATTTGGTGGCTGTTTTATG 

TCCCAAAGGCCCACTTCGTATGCTGGTTGAAACAGCTCAGGAAAGAAATGAGACTCTCTTTCCAG 

CTCTTATCTATTCCTCAACAATGGTGTGGTTGGTGAATATGGCTGAAGGAGACCC^GAAGCCCAA 

AGGAGGGTACCCAAGAACCCC^GTATA^CACAC\AAGAGCGGAGAGAGAGACACAGGAC\GTGG 

TTCTGGGAACGATGATGGTGGCTTCAGTGAGGAGTGGGAGGCCCAAAGAGACAGTCACCTGGGGC 

CTCATCGCTCCACTCCCGAGTCAAGAGCTGCTGTCC\GGAACTTTCTGGGAGCATTCTAACGAGT 

GAAGACCCGGAGGAAAGAGGAGTAAAACTTGGACTGGGAGATTTCATTTTCTAC^ 

TGGTAAGGCCTCAGCAACCGCC^GTGGAGACTGGAACACAACC^ 

TGATCGGCCTGTGCCTTAC\TTACrCCrGCTCGCCATTTTCAAGAAAGCGTTGC 

ATCTCCATCACCTTCGGGCTCGTGTTCTACTTCGCCACGGATT^ 

CCAACTTGCATTCCATCAGTTTTATATCTAGCCTTTCTGCAGTTAGAACATGGA 

TGATTATCAAAAACACAAAAACAGAGAGCAAGCCCGAGGAGGAGACTGG 

TCAGCTAACAAAGGCAGGACTCCAGCTGGACTTCTGCAGCTTCCTTCCGAGTCTCCCTAGCCA 

CGCACTACTGGACTGTGGAAGGAAGCGTCTACAGAGGAACGGTTTCCAACATCCATCGCT 

AGACGGTGTCCCTCAGTGACTTGAGAGACAAGGACAAGGAAATGTGCTGGGCCAAGGAGCTGCCG 

TGCTCTGCTAGCITTGACCGTGGGCATGGAGATTT^ 

AAGTGAGGTGAACC ^ 



~ 148 



SEQUENCE ID HO. 13 5 
FILE NAME: KARHP.PRO 



HTEIPAPLSYFQMAQKSEDSHSSSAIRSQNDSQERQQQHDRQRLONPEPISNGRPQSNSRQWEQ 
DESEDESLTIJnfGAKHVTJttFVPVTLOF/VVVATIKSVS FYTRKDGQLIYTPFTEDTETVGQRAL 
KS I UTAAIMIS VIVTMTILLWLYKYRCYKVIHAWLI IS S LLLLFFFS FI YLGEVFKTYNVAVD £ 
VTVAIJLIWNFGVyGMIAIKWKGPnRI^QAYIxIHISAmALVTI 
VLCPKGPIilMLVETAQERNETLFPAIJYSSTMVWLVl^^ 

DSGSGNDTOGFSF^WEAQRDSHLGPHRSTPESRAAVQELSGSILTSEDPEERGVKLGLGDFIFYS 
VLVGKASATASGDWTTIACFVAILIGLCLTIiLIAIFKKALPAIiPISITFGLVFYFATDYLV'QP 
FMDQLAFHQFYI* 




SEQUENCE ID N0^136 

10 20 30 40 50 60 

GAATTCGGCA CGAGGGCATT TCCAGCAOTG AGGAGACAGC CAGAAOCAAG CTTTTGGAGC 

70 80 90 100 110 120 

TGAAGGAACC TGAGACAGAA GCTAGTCCCC CCTCTGAATT TTACTGATGA AGAAACTGAG 

130 140 150 160 170 1B0 

GCCACAGAGC TAAAGTGACT TTTCCCAAGG TCGCCCAGCG AGGACGTGGG ACTTCTCAGA 

190 200 210 220 230 240 

CGTCAGGAGA OTGATGTGAG GOAGCTOTOT GACCATAGAA AGTGACGTGT TAAAAACCAG 

250 260 270 280 290 300 

CGCTGCGCTC ITTOAAAGCC AGGGAGCATC ATTCATTEAG CCTGCTGAGA AGAAGAAACC 

310 320 330 340 350 360 

AAGTGTCCGG GATTCAAGAC CTCTCTQCGG CCCCAAOTGT TCGTGGTOCT TCCAGAGGCA 

370 380 390 400 410 420 

GGGCTATGCT CACATTCATG GCCTCTGACA GCGAGGAAGA AGTGTGTGAT GAGCGGACGT 

430 440 450 460 470 480 

CCCTAATGTC GGCCGAGAGC CCCACGCCGC GCTCCTGCCA GGAGGGCAGG CAGGGCCCAG 

490 500 510 520 530 540 

AGGATGGAGA GAATACTGCC CAGTGGAGAA GCCAGGAGAA CGAGGAGGAC GGTGAGGAGG 

550 560 570 580 590 600 

ACCCTGACCG CIATGTCTGT AGTGGGOTTC CCGGGCGGCC GCCAGGCCTG GAGGAAGAGC 

610 620 630 640 650 660 

TOACCCTCAA ATACGGAGCG AAGCATGTGA TCATGCTGTT TGTGCCTGTC ACTCTGTGCA 

670 680 690 700 710 720 

TGATCGTGGT GGTAGCCACC ATCAAGTCTG TGCGCTTCTA CACAGAGAAG AATGGACAGC 

730 740 750 760 770 780 

TCATCTACAC GCCATTCACT GAGGACACAC CCTCGGTGGG CCAGCGCCTC CTCAACTCCG 

790 800 810 820 830 840 

TGCTGAACAC CCTCATCATG ATCAGCQTCA TCQTGGTTAT GACCATCTTC TTQGTGGTGC 

850 860 870 880 890 900 

TCTACAAGTA CCGCTGCTAC AAGTTCATCC ATOGCTGGTT GATCATGTCT TCACTGATGC 

. 910 920 930 940 950 960 

TGCTGTTCCT CTTCACCTAT ATCTACCTTG GGGAAGTGCT CAAGACCTAC AATGTGGCGA 

970 980 990 1000 1010 1020 

TGGACTACCC CACCCTCTTG CTGACTGTCT GOAACTTCOG GGCAOTOGGC ATGGTGTGCA 

1030 1040 1050 1060 1070 1080 

TCCACTGGAA OGGCCCTCTG GTGCTGCAGC AGGCCTACCT CATCATGATC AGTQCGCTCA 

1090 1100 1110 1120 1130 1140 

TGGCCCTAGT GTTCATCAAG TACCTCCCAG AOTGGTCCGG GTGGGTCATC CTGGGCGCCA 

1150 1160 1170 1180 1190 1200 

TCTCTGTGTA TGATCTCGTG GCTGTGCTGT GTCCCAAAGG GCCTCTGAGA ATGCTGGTAG 

1210 1220 1230 1240 1250 1260 

AAACTGCCCA GGAGAGAAAT GAGCCCATAT TCCCTCCCCT GATATACTCA TCTGCCATGQ 



150 



1270 1280 1290 1300 1310 1320 

TOTGGACGGT TGGCATGGCG AAQCTGGACC CCTCCTCTCA GGGTGCCCTC CAGCTCCCCT 



1330 


1 ^Aft 
J. .54V 


1350 1360 


1170. 


1380 


ACGACCCOGA 


GATGGAAGAA 


GACTCCTATG ACAGTTT7QG 


GGAGCCTTCA 


TACCCCGAAG 


1390 


14UU 


JL4J.U A4*U 


1430 


144.0 


TCTTTQAOCC 


TCCCTTGACT 


GGCTACCCAG QOGAGGAGCT 


QOAQGAAGAG 


GAGGfcAAGGG 






14/V i*Ov 


1490 


moo 


GCGTGAAGCT 


TGGCCTCGGG 


GACTTGATCT TCTACAGTGT 


GCTGOTGOGC 


AAGGCGGCTG 


1510 


1520 


1530 1540 


1550 


1560 


CCACGGGCAQ 


CGGGGACTGG 


AATACCACGC TOGCCTGCTT 


CQTGQCCATC 


CTCATTQOCT 


1570 


1580 


^ p* a /% ^ ^ A A 

1590 1600 


1610 


1620 


TGTGTCTGAC 


CCTCCTGCTG 


CTTCCTGTOT TCAAGAAGGC 


GCTGCCCGCC 


CTCCCCATCT 


1630 


1640 


1650 1660 


1670 


1680 


CCATCACGTT 


CGOOCTCATC 


TTTTACTTCT CCACGGACAA 


CCTGOTGCGG 


CCGTTCATGG 


1690 


^ ft A 

1700 


1710 1720 


1730 


1740 


ACACCCTGGC 


CTCCCATCAG 


CTCTACATCT GAGGGACATG 


GTGTGCCACA. GOCTQCAAGC 


1750 


1760 


1770 1780 


1790 


1800 


TOCAOGGAAT 


TOTCATTGGA 


TGCAGTTGTA TAOTTTTACA 


CTCTAQTGCC 


ATATA'l'lTTT 


1810 


1820 


1830 1840 


1850 


1860 


AAC1ACTTTTC 


TTTCCTTAAA 


aaataaagta jmM , r | i t vyp 


TGGTGAGOAG 


GAGGCAGAAC 


1870 


1880 


1890 1900 


1910 


1920 




GTQCCAQCTG 


TTTCATCACC AGACTTTOOC 


TCCCQCTITG 


GGGAGCGCCT 


1930 


1940 


1950 1960 


1970 


1980 




ACAGGAAGCA 




ACTGAOAAGG 


TCAGATTAGG 


1990 


2000 


2010 2020 


2030 


2040 


GTGGGGAGAA 


GAGCATCCGG 


C&TGAGGGCT GAGATGCCCA 


AAOAflTQTGC 


TCGGGAGTGG 


2050 


2060 


2070 2080 


2090 


2100 


CCCCTGGCAC 


CTOGGTGCTC 


TGGCTGGAGA GQAAAAGCCA 


GTTCCCTACG 


AGGAQTOTTC 


2110 


2120 


2130 2140 


2150 


2160 


CCAATGCTTT 


GTCCATGATG 


TCCTTGTTAT TTTATINCCY 


TTANAAACTG 


ANTCCTNTTN 


•2170 


2180 


2190 2200 


2210 


2220 


TINTTOCGGC 


AOTCACHCTO 


CTGGGRAGTG QCTTAATAGT 


AANATCAXTA 


AANAGNTGAG 


2230 


2240 


2250 2260 


2270 


2280 


TCCITTTTAGA 


AAAAAAAAAA 


AAAAAAAAAA AAAAAAAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


2290 


2300 


2310 2320 


2330 


2340 



AAAAA 



1 C, 1 




152 



o o 



rrimars 



969 

970 

989 

990 

994 

995 

1003 

1004 

999 

996 

100 



ggtaccgccaccatgacagaggtacctgcac 'Seqvje^ce 10 Mo* 
gaattcactggctgtagaaaaagac 5ec^u-ence ^ K/o*. 15*1 

ggatccggtccacttcgtatgctg Se^UCAce. KJe". 'HO 

ttttttgaattcttaggctatggttgtgttcca ^eCjueAce. ID No'. 14| 



ga-ttagtggttgttttgtg 

gattagtggcfcgttttgtg 

tttttccagctctcattta 

tttttccagttctcattta 

tacagtgttctggttggta 

aaa cfri ggattgggagat 

tacagtgttgtggt-tggta 



Secjue^jce ^ Mo*, ma. 
Se^ue^e ID Kb'- 143 
Sec^<ev\ce ID Kta- W4 

Soooc^ce 10 No'- 145 
Secure TO Nto'- Wfc 
Sequence 10 ^ • ^1 



1 




SEQUENCE ID HO. 149 
FILE NO. 374-984. GEN 

GTCT\GA m AAGNCAACATTC\GGGGTAGA.\GGGGACTQTTTATTTTTTCCTTTAGTCTCTCTTA 
AAGAGTGAGAAAAATTTTCCCAGGAATCCCGGT 

AAGTT\CAACCCCACAACCTTAGAGCTTTTGTTAGGAJ\.GAGGCTTGGTGGGATTACCGTGCTTGG 
CTTGGCTTGGTCAGGATTCACCACCAGAGTCATGTGGGAGGGGGTGGGAACCCAA^ 

ATTCTGC^CTCAG^AAJVTAAAGGAGAAAATAGCTGTTC 

GCCCA T GCTTTGTGGTTTAAGGGCCAGCTAGTTACAATGACAGCTAGTTACTGTTTCCATGTAAT 
TTTCTTAAAGGTATTAAATTTTTCTAAATATTAGAGCTGTAACTTCCACTTTCTCTTGAAGGCAC 
AGWAAGGGAGTCACAAGACACTGTTGCAGAGA^TGATGATGGCGGGTTCAGTGAC-GAATGGGAAS 
CCCAGRGGGACANTCATCTAGGGCCTCATCGCTCTACACCTGAGTCACGAGCTKCTNTCCAGGRA 
CTTTCCANCAGTATCCTCGCTGGTGAAGACCCAGAGGAAAGNATGTTCANTTCTCCATNTTTCAA 
AGTCATGGATTCCTTTAGGTAGCTACATTATCAACCTTTTTGAGAATAAAATGAATTGAGAGTGT 
TACAGTCTAATTCTATATCACATGTAACTTTTATTTGGATATATCAGTAATAGTGCTTTTTYNTT 
TTTTTTTTTTTTTTTTTTTTTTTTTNGG1IGANAGAGTCTCGCTCTGTCGCCAGGTTGGAGTGCAA 
TGGTGCGATCTTGGCTCACTGAAAGCTCC\CCNCCCGGGTTCAAGTGATTCTCCTGCCTCAGCCN 
CCCAAGTAGNTGGGACTACAGGGGTGCGCCACCACGCCTGGGATAATTTTGGGNTTTTTAGTAGA 
GA T GGCGTTTCACCANCTTGGNGCAGGCTGGTCTTGGAACTCCTGANATCATGATCTGCCTGCCT 
TAGCCTCCCCAAAGTGCTGGGATTNCAGGGGTGAGCCACTGTTCCTGGGCCTC 



154 



o 



SEQUENCE ID NO. 150 
FILE NO. 88S-1012.GZN 

CTGCAG T GAC^CGAC-ATC\TGCrGCTGTACTCCAGCCTGC^CC\CAGAGCCW.CTCC\TCrCCC 
AAAA AAA \AAAATATTAATTAATATGATNAAATGATGCCTATCTCAGAATTCTTGTAAGGATTTC 
TTAGKAC\AGTGCTGGG7ATAAACTATANAT7CRATAGATGNCGATTA77ACTTAYTATTGTTAT 
TGATAAA^AACAGCAGCATCTACAGTTAAGACTCCAGAGTCAGTCACATAGAATCTGGNACTCCT 
A^TGTAGNAAACCCC2JM4AGAAAGAAAACACAGCTGAA 

TTCTCTCA^CATTGTGGGGTTGAGTAGGGCAGTGATATTTTTGAATTpTGAAATCATANCAAAG 
AGTGACCAACTTTTTAATATTTGTAACCTTTCCTTTTTAGGGGGAGTAAAACTTGGATTGGGAGA 
TTTCA^TTCTACAGTGTTCTGGTTGGTAAAGCCTCAGCAACAGCCAGTGGAGACTGGAACACAA 
CCATAGCCTGTTTCGTAGCCATATTAATTGTMMSTATACACTAATAAGAATGTGTCAGAGCTCTT 
AATGTCMAAACTITGATTACACIAGTCCCTTTAAGGCAGTTCTGTTTTAACCCCAGGTGGGTTAAA 




AUiUtiiAuiia 1 v-^rvv— x l vjvjwixjv- x x i. iflAflAv. iui w 

TTAATTGTGTAGTTTTTAAAATTCCCCAGGAAATTCTGGTATTTCTGTTTAGGAACCGCTGCCTC 
AAGCCTAGCAGCACAGATATGTAGGAAATTAGCTCTGTAAGGTTGGTCTTACAGGGATAAACAGA 
TCCTTCCTTAGTCCCTGGACTTAATCACTGAGAGTTTGGGTGGTGGTTTTGGATTTAATGACACA 
ACCTGTAGCATGCAGTGTTACTTAAGAC 



15^ 



SEQUENCE ID HO. 151 



FILE NO. 901-912. GEN 
GGA^CCCTCGC r ~nT^AGACCATAC\AC<:-TAACT 

TC^GGTGTTGGCGGG^i^GTGTCTTTTAGCATGCTAATGTATTATAATTAGCGTATAGTGAGCAG 
T^AG^A^AACCAGAGGTCACTCTCCTCACCATCTTGGTTTTGGTGGGTTTTGGCQAGCTTCTTTA 
TTGCAACCAGTTTrATCAGCAAGATCTTTATGAGCTGTATCTTGTGCTGACTTCCTATCTCATCC 
CGN?\CrA^GAGTACCTAACCTCCTGOlAATTGMAGNCa^GNAGG7CTTGGNCTTATTTNACCCA 
GCCcllTATTCAARATAGAGTNGYTCTTGGNCCAAACGCCYCTGACACAAGGATTTTAAAGTCTTA 
TTAATTAAGGTAAGATAGKTCCTTGSATATGTGGTCTGAJVATCACAGAAAGCTGAATTTGGAAAA 
AGGTGCTTGGASCrGCAGCCAGTAAACAAGTTTTCATGCAGGTGTCAGTATTTAAGGTACATCTC 
AAAGGATAAGTAGyVTTGTGTATGTTGGGATGAACAGAGAGAATGGAGCAAlICCAAGACCCAGGT 
AAAAC-AGAGGACCTGAATGCCTTCAGTGAACWGATAGATAATCTAGACTTTTAAACTGCATAC 
TTCCrGTACATTGTTTTTTCTTGCTTCAGGTTTTTAGAACTCATAGTGACGGGTCTGTTGTTAAT 
CCCAGGTCTAACCGTTACCTTGATTCTGCTGAGAATCTGATTTACTGAAAATGTTTTTCTTG7GG 
TTATAGAATGACAATAGAGAACGGCAGGAGCACAACGACAGACGGAGCCTTGGCCACCCTGANCC 
ATTA T CTAATGGACGACCCAGGGTAACTCCCGGCAGGTGGTGGANCAAGATGAGGAAGAAGATGA. 
GGANCTGACATTGAAATATGNCGSCAAGCATGTGATCATGCTCTTTGKCCCTGTGACTCTCTGCA 
TGGTGGTGGTCGTGGNTACCATTAAGTCAGTCAGCTTTTATACCCGGAAGGATGGGCAGCTGTAC 
GTATGAGTTTKGTTTTATTATTCTCAAASCCAGTGTGGCTTTTCTTTACAGCATGTCATCATCAC 
CrTGAAGGCCTCTNCATTGAAGGGGCATGACTTAGCTGGAGAGCCCATCCTCTGTGATGGTCAGG 
AGCAGTTGAGAGANCGA.GGGGTTATTACTTCATGTTTTAAGTGGAGAAAAGGAACACTGCAGAAG 
TATGTTTCCTGTATGGTATTACTGGATAGGGCTGAAGTTATGCTGAATTGAACACATAAA7TCTT 
TTCCACCTCAGGGNIATTGGGCGCCCATTGNTCTTCTGCCTAGAATATTCTTTCCTTTNCTNACT 
TKGGNGGATTAAATTCCTGTCATCCCCCTCCTCTTGGTGTTATATATAAAGTNTTGGTGCCGCAA 

AAGAAGTAGCACTCGAATATAAAATTTTCC7TTTAATT 

GAAGGGTGCACCC^ACA.GATGGAACAATGGCA-AGCGCACATTTGGGACAAGGGAGGGGAAAGGG 
TTCTTA^CCCTGACACACGTGGTCCCTGCTGNTGTGTNCTNCCCCCACTGANTAGGGTTAGACTG 
GACAGGCTTAAACTAATTCCAATTGGJrrAATTTAAAGAGAATNATGGGGTGAATG 
AGTCAAGGAAGAGNAGGTAGNAGGTAACTTGAATGA 



o 



SEQUENCE ID SO. 152 
FILE NO. 9 10 -915. GEN 



GTC^A^WJiaACCAACATTGCCA^ 
GG~CTk^ACC\AGTATTCNCCAATTTGTG^ 

TT^a^ACATTGTCTGTGCCTGCTTTCACACTACAGTAGCACAGTTGAGTGTTTGCCCTGGAGACC 
A^ATGACCCATAGAGCTTAAAATATTCAGTCTGGCTTTTTACAGAGATGTTTCTGACTTTGTTAA 
TAGAAAA T CAACCCAACTGGTTTAJ\ATAATGCACA7ACTTTCTCTCTCATAGAGTAGTGCAGAGG 
TAGNCAG^CCAGATTAGTASGGTGGCTTC^CGTTCATCCAAGGACrCAATCrCCTTCTTTCTTCT 
TVAGCTTCT^CCTCTAGCTTACTTCAGGGTCC\GGCrGGAGCCCTASCCTTCATTTCTGAC\GT 
AGGAAGGAGTAGGGGAGAAAAGAACATAGGACATGTCAGCAGAATTCTCTCCTTAGAAG7TCCAT 
AC\CAACACATCTCCCTAGAAGTC\TTGCCCTTACTTGTTCTCATAGCCATCCTAAATATAAC-C-G 
AG T CAGAAGTAAAGTCTXXOTGGCTGGGAATATTGGCACCT 

TGAGAAACAAGGGGAA.GATGGATATGTGACATTATCTTAAGACAACTCCAGTTGCAATTACTCTG 
CAGA T GAG^GGCACTAATTATAAGCCATATTACCTTTCTTCTGACAACCACTTGTCAGCCCNCGT 
GGTTTCTGTGGCAGAATCTGGTTCYATAMCAAGT^ 

~™ , wn i ar:/-Ti<-r-r'Ar^ariaTnr:Af:CCAGTGTCTGCTTCACATGTT 




TTAAGAAAAAGAAAATTuTLiliji lUiAbjiiju : mhj i ivwi v. j. i w« inn wiv.iw.^ 
TAGGGCTT r 'KGKGTTTGKTTTATTGTAGAATCTATACCCCATTCANAGAAGATACCGAGACTG7G 
GGCCAGAGAGCCCTGCACTCAATTCTGAATGCTGCCATCATGATCAGNGTCATTGTWG7CATGAC 



TAITOCTCCTGGTGGTTCWGTATAAATACAGGTGCTATAAGGTGAGC=lTGAGACACAGATCT77GN 
TTTCC^CCCTGTTCTTCTTATGGTTGGGTATTCTTGTCACAGTAACTTAJ^CTGATCTAC^AAAGA 
AAAAA^GTTTTGTCTTCTAGAGATAAGTTAATT 

CAAAAAATACAAAAAGGAAGCCAGGTGC\TGTGTAATGCCAGGCTCAGAGGCTGAGGCAGGA.GGA 
TCGCTTGGGCCCAGGAGTTCACAAGCAGCTTGGGCAACGTAGCAAGACCCTGCCTCTATTAAAGA 
AAACAAAAAACAAATATTGGAAGTATT^ 

G^AAAATATATATATTATGATTAG^ATCAAGATTTAGTGATAATTTATGTTATTTTGGGATTTC 

AA^GCCTTTTTAGGCCATTGTCTCAAMAAATAAAAGCAGA^ 

A^AAACATTTCC\TATAATAGCACWCTAAGTGGGTTTTTGNTTC-m 

AGGGCCTTGCCCTNYCACCCAGGNTGGAGTGAAGTGCAGTGGCACGATTTTGGCTCACTGCAG 



157 



SEQUENCE ID NO- 153 



FILE NO. 917- 93 6. GEN 

ATQTTTG ^ CAA^TCTCCGTTCCACCCTTGATTAAATAAGGTAGTATTCATTTTTTAAGTTTTAG 

CrmG^A^ATGTGTAAGTGTC^-TATGCTGTCTAATGAATTAAGACAA 

CCC™acaj>TCTC^ACMAAGAGC\GGCAAGATNC^^ 

T^TCTGCTCTCAGCTAGCTTGCCACCTAGAAJVGACTGGTTGTCNAAGTTGGAGTCCAAGAATCGC 
GGAGGA. T G T TTAAAATGCAGTTTCTCAGGTTCTCNCCACCCACCAGAAGTTTTGATTCATTGAGT 
GG^GGGAGaGGGCAGAGATATTTGCGATTTTAACAGCATTCTCTTGATXGTGATGCAGCTGGTTC 
S CAAATAGGTACC CT AAAGAAATGACAGGTGTT AAATTTAGGATGGCCAT CGCTTGTATGCCGGG 
AGAAGCACACGCTGGGCCCAATTTATATAGGGGCTTTCGTCCTCAGCTCGAGCARCCTCAGAACC 
CCGACAACCYACGCCAGCKCTCTGGGCGGATTCCRTCAGiCTGGGGAAGSCCAGGTGGAGCTCTGG 
KTTCTCCCCGCAATCGTTTCTCCAC^CCGGAC<:-CCCCGCCCCCTTCCTCCrC^CrCCTCCCCTCC 
TCCGTGGGCCGNCCGCCAACGACGCCAGAGCCGGAAATGACGACAACGGTGAGGGTTCTCGGGCG 
GGGCCTGGGACAGGCAGCTCCGGGGTCCZNCGNNWTNAa^TCGGAAACAAAACAGCGGCTGGTCTG 
GAAGGA a CCTGAKCTACGACCCGCGGCGGCAGCGGGGCGGCGGGGAAGCGTATGTGCGTGATGGG 
GAGTCO^CAAGCCAGGAAGGC^CCGCGGACATGGGCGGCCGCGGGCAGGGNCCGGNCCTTTGT 
GGCCGCCCGGGCCGCGAAGCCGGTGTCCTAAAAGATGAGGGGCGGGGCGCGGCCGGTTGGGGCTG 
GGGAACCCCGTGTGGGAAACCAGGAGGGGCGGCCCGTTTCTCGGGCTTCGGGCGCGGCCGGGTGG 
AGAGAGATTCCGGGGAGCCTTGGTCCGGAAATGCTGTTTGCTCGAAGACGTCTCAGGGCGCAGGT 
GCCTTGGGCCGGGATTAGTAGCCGTCTGAACTGGAGTGGAGTAGGAGAAAGAGGAAGCGTCT7GG 
GCTGGGTCTGCTTGAGCAACTGGTGAAACTCCGCGCCTCACGCCCCGGGTGTGTCCTTGTCCAGG 
GGCGACGAGCATTCTGGGCGAAGTCCGCACGCCTCTTGTTCGAGGCGGAAGACGGGGTCTTGATG 
CT^CTCCTTGGTCGGGACTGTCTCGAGGCArGC\TGTCCAGTGACTCTTGTGTTTGCTGCTGCT 
TCCCTCTCAGATTCTTCTCACCGTTGTGGTCAGCTCTGCTTTAGGCATATTAATCCATAGTGGAG 
GCTGGGA^GGGTGAGAGAATTGAGGTGACTTTICCATAATTCAGGTGAGATGTGATTAGAGTYCG 
GATCCTNCGGTGGTGGCAGAGGCTTACCAAGAAACACTAACGGGACATGGGAACCAATTGAGGAT 
CCAGGGAATAAAGTGTGAAGTTGACTAGGAGGTTTTCAGTTTAAGAACATGGCAGAGACATTCTC 
AGAAA T AAGGAAGTTAGGAAGAAAGACCTGGTTTAGAGAGGAGGGCGAGGAAGTGGTTTGGAAGT 
GTCACTTTGGAAGTGCCAGCAGGTGAAAATGCCCTGTGAACAGGACTGGAGCTGAAAACAGGAAT 
CAATTCCATAGATTTCCAGTTGATGTTGGAGCAGTGGAGAAGTCTAAlTCrAAGGAAGGGGAAGAG 
GAGGCCAAGCCAAACACTTAGGAACACTTNCNACGAGGGGGTGGAAGAAGAGCAAGGAGCCAGCT 
GAGGAGAATGAGTGTGGTTGGAGAACCACCACAGCNCAGGGTCGCCAGANCTGAGGAAGGGGAGG 

GAAGCTTATCGAGXAMSGW CRACMKCGAGTTGGCAGGGAT 



o , o 



SEQUENCE ID NO. 154 
FILE NO* 9 3 0-9 19, GEN 

GTCTTTCCCATCTTCTCCACAGAGTTTG 

CATTGTCAGCTCTTCC\A.TCTCCAT 

GTAC\GCCTTTTATGGACCAATTAGCATT^ 

TCCCATGGATGTTTCTTCTTTGACTATi^^ 

CACATCTAACAAATCAAGATCCCCGGCTGGACTTC 

CTTGCACTATTGGACTTTGGAAGGAGGTGCCTA^ 

TGGACTGTGTCCTCGGTGCAGAAACTACCAGATTTGAGGGACGAGGTCAAGGAGATATGATAGGC 

CCGGAAGTTGCTGTGCCCCATC\GCAGCTTGACGC^^ 

CGAACTCTCAGGACTACCGTTACCAAGAGGTTAGGTGAAGTGGTT^ 

ATCTTAAACTACACGTTGAAAATCAACCCAATAATTCTGTATTAACTGAATTCT 

GAGGTACTGTGAGGAAGAGCAGGCACCACCAGCAGAATGGGGAATGGAGAGGTGGGCAGGGGTTC 

CAGCTTCCCTTTGATTTTTTG 
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SEQUENCE ID NO- 155 
FILE NO. 93 2 -943. GEN 

GGATCCGCCOjCCTTGGCCTCC^ 

GTCTGCGATTTCTTGCCAit^ 

ATTCCCTTCTCCTNSWCT^ 

GATAAGATGACATTATAGAATNTNGCAAA^^ 

AAAGATTAGNTTGAGTTTGGGCCAGCATAGAAAAAGGAATGTT^ 

CTC^AGCYCCCCTTTTGSTGX^^ 

TTGGTTGTCTCAGGCGGTTCCTACTTATTGCTAAAGAGTCCTACCTTGAGCTTATAGTAAATTTG 

TCAGTTAGTTGAAAGTCGTGACAAATT^^ 

TGATTGGTNTAAATGNATTTACTAGGATTTAACTA^ 

Aa\CCTAATCTGGGA(X:CT^^ 

GAGAACACATGA^GAMMGGTTTGW^ 

TATAATTGTMTGMACAAA.GTTCTGTTTTTCTTTCCCT7TNCAGAACCTCAAGAGGCTTTGTTTTC 
TGTGAAACAGTATTTCTATACAGNTGCTCCAATGACAGAGTNACCTGCACCGTTGTCCTACTTCC 
AGAATGCACAGATGTCTGAGGACAACCACCT^ 

TYTCTNAAACTGCCTYYGNCAGACTGGATTCACTTATCATCTCCCCTCACCT 

AGGGGGS TAGGNAGGGCTTTCT CTACTTNACCACA.TTTNAT AA.TTATTTTTGGGTGAC CTTCAGC 

TGATCGCTGGGAGGGACACAGGGCTTNTTTA?lC\ 
TCACATTTCANC 
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SEQUENCE ID NO. 156 
FILE NO. 951-952 .GEN 

CTGCAGCTT^CCTTTAAACTAC-GAAGACrTGTTCCT^ 

AGC\?AT\GCAG7CAAACCCAAATGAAATTTNTACAGATGTTCTGTGTCATTTTATNTTGTTTAT 
GTTG^CTCCCCCACCCCmCCAGTTCACCrGCCATTTATTT 

GTAAAAAGAGACAAAAAACATTAAACTTTTTTCCTTCGTTAATTCCTCCCTAC^ 

AGTTTAGCrCATA.CATTTTATTAGATGTCTTTTATGTTTTTCTTTTNCTAGATTTAGTGGCTGTT 

TNGTGTCCG\AAGGTCCACTTCGTATGCTGGTTGAAACAGCTCAGGAGAGAAATGAAACGCTTT7 

TCCAGCTCTCATT^ACTCCTGTAAGTATTTGGAGAATGATATTGAATTAGTAATCAGNGTAGAAT 

TTATCGGGAACTTGAAGANATGTNACTATGGCAATTTCANGGNACTrGTCrCATCTTAAATGA^ 

GNATCCCTGGACTCCTGUAG 



SEQUENCE ID NO- 157 



FILE NO. 983 -1011. GEN 
CCCCGTCTATGCATACTTTGTGTGTC 

ATGGTGTGGTTGGTGAATATGGCAGAJ^GGAGACCCGGAAGCTC^ 

CAAGTATAATGCAGAilAGTAGGTAA^ 

ATAAGCTAACAGTATAGNAATGTTTTTAT^ 

TTGAGAACTATGATAATGCCCAGTAAATACNCAGATA^ 

CCCAACAATACNGTCAAAGCATCCTAGGTTAAGAC^ 

GAAAGGTTCAGGCTGAGGTTATGATTGGGTTTGGGTTTTGGGNNNGTTTTTTATAAGTCATGATT 
TTAAAAAGAAAAAATAAACTCTCTCCAAACATGTAAAAGTAAGAATCTCCTAAA 
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SEQUENCE ID HO- 153 
FILE NO 92 5 -913. GEN 

CAGGAG^GGACTAGGTAJ\ATGNAA^ 
CANCTG^AAXGCTCANCACTXATGGGGAGTACT 

CrTG^GANCAGCCTGGGCAANATGGCGAAACCCTGTCTCTACTAAAAATAGCCANAAWNWAGCCT 

GCGTGGTGGCGCRCA.CGCGTGGTTCCACCTACTCAGGAGGCNTAAGCACGAGNANTNCTTGAACC 

CAGGAGGCAGAGGNTGTGGTGAPvCTGAGATCGTGCCACTGCACTCCAGTCTGGGCGACMAAGTGA 

GACCCTG T CTCCNNNAAGAAAAAAAAAATCTGTACTTTTTAAGGGTTGTGG<rACCTGTTAATTAT 

ATTGAAATGCTTCTYTTCTAGGTCATCCATGCCTGGCTTATTATATCATCTCTATTGTTGCTGCT 

CTTTTTTACATTCAOTACTTGGGCTAA^ 

CCTNNGTGCTGTGTAGCTATCATTTAAAGCCATGTACTTTGNTGATGAATTACTCTGAAGTTTTA 
ATTGTNTCCACATATAGGTCATACTTGGTATATAAAAGACTAGNC^GTATTACTAATTGAGACAT 
TCTTCTGTNGCTCCTNGCTTATAATAAGTAGAACTGAAAGNAACTTAAGACTACAGTTAATTCTA 
AGCCTTTGGGGAAGGATTATATAGCCTTCTAGTAGGAAGTCTTGTGCNATCAGAATGTTTNTAAA 
GAAAGGGTNTCAAGGAATNGTATAAANACCAAAAATAATTGAT 



SEQOEHCE ID NO. 159 
FILE NO. 849-892. GEN 



GTTNTCCNAACCAACTTAGGAGNTra 

CTNCAGTTGAGCCGTGATTGCACCCACTTTACTC 

TCCAAACACAAAAACAAAAACAAAaAAAGAGTAAAT^ 

TAGCACAGTTGATATAGGTTATGGTAA^TTATAAAGGTGGGANATTAATATCTAATGTTTGGGA 
G C CAT CA CATTATT CTAAATAATGTTTTGGTGG AAATT ATT GTA CAT CTTTTAAAATCTGTGT AA 
TTTTTTTTCAGGGAAGTGTTTAAAACCIATAACGTTGCTG 

GATCTGGAATTTTGGTGTGGTGGGAATGATTTCCATTCACTGGAAAGGTCCACTTCGACrCCAGC 
AGGCATATCTCATTATGATTAGTGCCCTCATGNCCCTGKTGTTTATCAAGTACCTCCCTGAATGG 
ACTGNGTGGCTCATCTTGGCTGTGATTTC^ 
TCACAGGAATGCCCCACTGGAGTGTTTTCTTTCCT 

TAACGTGTACATCCCATAACTCTTCAGTAAATCATTAATTAGCTATAGTAACTTT^ 

GATTTCGGCTGGGCATGGTAGCTCATGCCTGTAATCTTAGCACTTTGGGAGGCTGAGGCGGGCAG 

ATCACCTAAGCCCAGAGTTCAAGACCAGCCTGGGCAACATGGCAAAACCTCGTATCTA 

TACAAAAATTAGCCGGGCATGGTGGTGCACACCTGTAGTTCCAGCTACTTAGGAGGCTGAGGTGG 
GAGGATCGATTGATCCCAGGAGGTCAAGNCTGCAG 
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WE CLAIM; 

1. An isolated DNA molecule comprising a nucleotide sequence coding for 
Alzheimer Related Membrane Protein (ARMP) or a functional fragment or 
variant of the protein. 

2. The DNA molecule of claim 1 wherein the nucleotide sequence codes 
for the amino add Sequence ID No:2. 

3. The DNA molecule of claim 1 wherein the nucleotide sequence codes 
for the amino acid Sequence ID No: 133. 

4. The DNA molecule of claim 1 wherein the nucleotide sequence codes 
for the amino acid Sequence ID No: 134, 

5. The DNA molecule of claim 1 wherein the nucleotide sequence codes 
for human AKMP and is selected from the group consisting of 

(a) Sequence ID No:l; 

(b) nucleotides 186 to 2764 of Sequence ID No:l; 

(c) Sequence ID No:5; 

(d) Sequence ID No: 132; and 

(e) nucleotides 1 to 1017 and 1117 to 2791 of sequence ID No:l. 

6* The DNA molecule of claim 1 wherein the nucleotide sequence codes 
for mouse ARMP and has nucleotide Sequence ID No: 134* 

7, An isolated nucleic add molecule comprising a nucleotide sequence 
selected from the group consisting of 

(a) a deoxyribonucleodde sequence complementary to Sequence ID 
No:l; 
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(b) a ribonucleotide sequence complementary to Sequence ID No: 1; 

(c) a ribonucleotide sequence compleme. jy to the 
deoxyribonucleotide sequence of (a) or to the ribonucleotide 
sequence of (b); 

(d) a nucleotide sequence of at least 12 consecutive 
nucleotides capable of hybridising to Sequence ID No:l; and 

(e) a nucleotide sequence capable of hybridising to a nucleotide 
sequence of (d). 

8. The DNA molecule of claim 1 wherein the nucleotide sequence is the 
ARMP-coding nucleotide sequence of ATCC Deposit No. . 

9. An isolated DNA molecule comprising a nucleotide sequence selected 
from the group consisting of 

(a) Sequence ID No:3; 

(b) Sequence ID Nos:6 to 125; 

(c) Sequence ID Nos:126 to 131; 

(d) Sequence ID Nos:138 to 148; and 

(e) Sequence ID Nos:149 to 159. 

10. An isolated DNA molecule comprising a nucleotide sequence coding for 
E5-1 protein or a functional fragment or variant of the protein. 

11. The DNA molecule of claim 10 wherein the nucleotide sequence is 
selected from the group consisting of: 

(a) a nucleotide sequence coding for amino acid Sequence ID No: 
137; and 

(b) nucleotide Sequence ID No: 136. 

12. The DNA molecule of claim 10 wherein the nucleotide sequence is the 
E5-1 coding nucleotide sequence of ATCC Deposit No. . 
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13- An isolated nucleic acid molecule comprising a nucleotide sequence 
selected fix the group consisting of 

(a) a deoxyribonucleotide sequence complementary to Sequence ID 
No: 134; 

(b) a ribonucleotide sequence complementary to Sequence ID 
No:134; 

(c) a ribonucleotide sequence complementary to the 
deoxyribonucleotide sequence of (a) or to the ribonucleotide 
sequence of (b); 

(d) a nucleotide sequence of at least 12 consecutive 
nucleotides capable of hybridising to Sequence ID No: 134; and 

(e) a nucleotide sequence capable of hybridising to a nucleotide 
sequence of (d). 

14. An Isolated nucleic acid molecule comprising a nucleotide sequence 
selected from the group consisting of 

(a) a deoxyribonucleotide sequence complementary to Sequence ID 
No: 136; 

(b) a ribonucleotide sequence complementary to Sequence ID 
No:136; 

(c) a ribonucleotide sequence complementary to the 
deoxyribonucleotide sequence of (a) or to the ribonucleotide 
sequence of (b); 

(d) a nucleotide sequence of at least 12 consecutive 
nucleotides capable of hybridising to Sequence ID No: 136; and 

(e) a nucleotide sequence capable of hybridising to a nucleotide 
sequence of (d). 

15* An isolated.DNA molecule comprising a nucleotide sequence coding for 
a mutant form of Alzheimer Related Membrane Protein. 
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16. The DNA molecule of claim 15 wherein the nucleotide sequence has at 
least one m Ion selected from the group consistii if 

i) 685, A-»C ii) 737, A-G w) 986, OA 
iv) 1105, OG v) 1478, OA vi) 1027, OT 
vii) 1102, C-T and vui) 1422, OG. 

17. The DNA molecule of claim 15 wherein the nucleotide sequence 
comprises the nucleotide Sequence ID No:l or 132 having at least one 
mutation selected from the group consisting of 

i) 685, A-C u) 737, A-G iii) 986, OA 
iv) 1105, OG v) 1478, G-A vi) 1027, OT 
vii) 1102, C-T and vui) 1422, OG. 

18. A recombinant vector comprising the DNA molecule of any of claims 1 
to 17. 

19. A host cell transfected with a recombinant vector comprising the DNA 
molecule of any of claims 1 to 17. 

20. Purified Alzheimer Related Membrane Protein or a functional fragment 
or variant thereof. 

21. The protein of claim 20 comprising an amino acid sequence selected 
from the group consisting of 

(a) Sequence ID No:2; 

(b) Sequence ID NO:4; 

(c) Sequence ID No: 133; and 

(d) Sequence ID No: 135. 
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22. Substantially pure mutant Alzheimer Related Membrane Protein. 



23. The protein of claim 22 having a mutation selected from the group 
consisting of 

i) M146L; ii) H 163R; in) A246E; 
iv) L286V: v) C 410Y; vi) A 260V; 
vii) A 285V; and viii) L 392V. 

24. Substantially pure £5-1 protein. 

25. The protein of claim 25 comprising the amino acid Sequence ID 
No:137. 

26. An isolated DNA molecule comprising a splice variant of the nucleotide 
Sequence ID No: 1. 

27. The DNA molecule of claim 26 comprising nucleotides 1 to 1017 and 
1117 to 2791 of Sequence ID No:l. 

28. A nucleotide sequence which codes for an antigenic determinant of a 
protein selected from the group consisting of Sequence ID No:2, Sequence ID 
No: 4, Sequence ID No: 133, Sequence ID No: 135 and Sequence ID No: 137. 

29. A nucleotide sequence of claim 28, wherein said sequence is selected 
from the group of nucleotide sequences which code for the following protein 
fragments of Sequence ID No:2 consisting of amino acid residues 27-44, 46- 
48, 50-60, 66-67, 107-111, 120-121, 125-126, 155-160, 185-189, 214-223, 
220-230, 240-245, 267-269, 273-282, 300-370 and 400-420. 

30. A polypeptide of at least 6 amino acid residues selected from at least 6 
consecutive amino acid residues of Seq. ID No: 2. 
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31. A polypeptide of at least 6 amino acid residues selected from at least 6 
consecutive - An a acid residues of Seq. ID No: 13L. 

32. A polypeptide having antigenic properties and selec t ed from the group 
consisting of amino acid residues 27-44, 46-48, 50-60, 66-67, 107-111, 120- 
121, 123-126, 155-160, 185-189, 214-223, 220-230, 240-245, 267-269, 273- 
282, 300-370 and 400-420 of Sequence ID No: 2. 

33. An antibody capable of specific binding to ASMP. 

34. An antibody capable of specific binding to an ARMP extracellular 
domain. 

35. An antibody capable of binding to a polypeptide in accordance with 
claim 29. 

36. A bioassay for determining if a subject has a normal or mutant 
Alzheimer's related membrane protein (ARMP), said bioassay comprising: 

i) providing a biological sample of said subject; 

ii) conducting a biological assay on said sample to detect a normal 
or mutant gene sequence coding for ARMP, a normal or mutant 
ARMP amino acid sequence or a normal or defective protein 
function. 

37. A bioassay of claim 36, wherein said bioassay is a DNA or UNA based 
biological assay. 

38. A bioassay of claim 37, wherein said biological assay is selected from 
the group consisting of probe hybridization, direct DNA sequencing, restriction 
enzyme analysis, electrophoretic mobility, RNase detection, chemical cleavage, 
ligase-mediated detection and PCR amplification. 
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39. A bioassay of claim 38, wherein said biological assay detects at least 
one mutatic .elected from the group consisting of 

685, A-C ii) 737, A-G iii) 986, OA 
iv) 1105, OG v) 1478, G-A vi) 1027, C-T 
vii) 1102, C-T and viii) 1422, OG. 

40. A bioassay of claim 36, wherein said bioassay is amino acid based 
biological assay. 

41. A bioassay of claim 40, wherein said biological assay is selected from 
die group consisting of immunoassay, enzyme site specific assay, 
dectrophoretic mobility of cleaved polypeptides* 

42. A bioassay of claim 41, wherein said selected biological assay detects at 
least one mutation selected from the group consisting of: 

i) M 146L; ii) H 163R; iii) A 246E; 
iv) L286V: v) C410Y; vi) A 260V; 
vii) A 285V; and viii) L 392V, 

43* A bioassay of claim 36, wherein said biological assay detects normal or 
defective protein function. 

44, A process for recombinantly producing Alzheimer's related membrane 
protein (AKMP) comprising culturing a host cell of claim 19 under suitable 
conditions to produce said ARMP by expressing said DNA sequence. 

45* A therapeutic composition comprising Alzheimer's related membrane 
protein and a pharmaceutically acceptable carrier. 

46* A recombinant vector for transforming mammalian tissue cells to 
express a therapeutically effective amount of Alzheimer's related membrane 
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protein in said cells, said vector being delivered to said cells by a suitable 
vehicle. 

47. A recombinant vector of claim 46, wherein said tissue cells comprise 
liver, kidney, spleen, bone marrow and neurological cells. 

48. A recombinant vector of claim 46, wherein said neurological cells 
comprise central nervous system cells of neuron, brain and vascular cell type. 

49. A recombinant vector of claim 46 wherein said vehicle is selected from 
the group consisting of vacinia virus, adenovirus, retrovirus, liposome 
transport, neurotropic viruses and Herpes simplex. 

50. A method of treating a patient deficient in normal Alzheimer's related 
membrane protein (ARMP) comprising the step of administering to said patient 
a therapeutically effective amount of said protein targeted at a variety of 
patient cells which normally express ARMP. 

51. A method of claim 50, wherein said variety of patient cells comprise 
heart, brain, lung, liver, skeletal muscle, kidney, pancreas and neurological 
cells. 

52. An immunotherapy for treating a patient having Alzheimer's Disease 
due to cellular production of mutant ARMP, said immunotherapy comprising 
treating said patient with antibodies specific to said mutant ARMP, said 
antibodies binding mutant ARMP to reduce thereby levels of mutant ARMP in 
said patient. 

53. An immunotherapy of claim 52, wherein said antibodies are developed 
by said patient's immune system upon administration to said patient of a 
vaccine comprising said mutant ARMP and a pharmaceutically acceptable 
carrier, 
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54. A vaccine composition for invoking an immune response in a human 
susceptible Alzheimer's Disease, said composite comprising a mutant 
AKMP and a pharmaceutical^ acceptable carrier. 

55. A method of treating a patient of ALzheimer's Disease comprising 
administering to said patient a therapeutically effective amount of a ligand or 
chemical molecule which corrects symptoms associated with expression of 
mutant AEMP gene. 

56. A transgenic animal model for animal symptoms of Alzheimer's 
disease, said animal model having within its genome a DNA molecule in 
accordance with claim 1 with at least one mutation which when expressed 
results in the presence or absence of mutant ARMP in the animal's cells and 
thereby manifests the symptoms. 

57. A transgenic animal model of claim 56, wherein said mutation is at 
least one nucleotide mutation. 

58. A transgenic animal model of claim 56, wherein said polynucleotide 
sequence is Sequence ID No: L 

59. A transgenic animal model of claim 56, wherein said polynucleotide 
sequence is Sequence ID No: 3. 

60. A transgenic animal of claim 58, wherein said sequence mutations are 
selected from the group consisting of: 

i) 685, A-C ii) 737, A-G iii) 986, OA 
iv) 1105, OG v) 1478, OA vi) 1027, OT 
vii) 1102, OT and viii) 1422, OG. 

61. A transgenic animal of claim 58, wherein said animal is a rodent 
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62. A transgenic animal of claim 58, wherein said rodent is a mouse. 

63. A transgenic mouse model for Alzheimer's disease, said mouse having 
said mammalian polynucleotide of claim 13* with a mutation which when 
expressed results in the presence or absence of a mutant ARMP in the animal's 
cells and thereby manifest the syndrome* 

64. A transgenic mouse exhibiting a physiological or neurological disorder 
which can be linked to one more mutations in the mouse ARMP gene or mouse 
ARMP. " 

65* A transgenic mouse of claim 64 exhibiting symptoms of cognitive, 
memory or behavioural disturbances. 

66. A transgenic mouse of claim 65 exhibiting tissue cell disorders in heart, 
brain, lung, liver, skeletal muscle, kidney, pancreas and neurological ceils, 

67. Use of a transgenic animal of claim 56 for screening proteins, ligands 
and chemical molecules for efficacy in reversing effects of Alzheimer's 
Disease. 

68. Use of a transgenic mouse of claim 63 for screening proteins, ligands 
and chemical molecules for efficacy in reversing effects of Alzheimer's 
Disease. 

69. An isolated DNA molecule comprising a nucleotide sequence coding for 
a mutant form of E5-1 protein. 

70. The DNA molecule of claim 69 wherein the nucleotide sequence has at 
least one mutation selected from the group consisting of: 

i) 787, A-T and 

ii) 1080, hr+Q. 
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71. Substantially pure mutant E5-1 protein. 

72. The protein of claim 71 having a mutation selected from the group 

consisting of: 

i) Asnl41He and 

ii) Met239Val. 



GENETIC SEQUENCES AND PROTEINS RELATED TO ALZHEIMER'S DISEASE 

Abstract of the Disclosure 



The present invention describes the identification, isolation, cloning, and determination 
of the Alzheimer Related Membrane Protein (ARMP) gene on chromosome 14 and a related 
gene, E5-1, on chromosome 1. Normal and mutant copies of both genes are presented. 
Transcripts and products of these genes are useful in detecting and diagnosing Alzheimer's 
disease, developing therapeutics for treatment of Alzheimer's disease, as well as the isolation and 
manufacture of the protein and the construction of transgenic animals expressing the mutant 
genes. 
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DECLARATION 



I hereby declare that all statements made herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true; and further that these statements were made with 
the knowledge that willful false statements and the like so made are punishable by fine or imprisonment, or 
both, under Section 1001 of Title 18 of the United States Code, and that such willful false statements may 
jeopardize the validity of the application or any patent issued thereon. 



SIGNATURES) 



Peter H. StGeorge-Hyslop 



Jt Canadian & UX 



Full name of sole or first inventor 



y / f 

Inventor's signature 

210 Richview Avenue, Toronto, M5P 3G3 



Citizenship 

U> m u life 



Date 



Residence 
Same 



Post Office Address 



Johanna M. Rommens 
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Full name of sole or second inventor (if any) 



nventor's signature 
105 McCaul Street, Toronto M5T 2X4 



Citizenship 



Date 



Residence 
Same 



Post Office Address 



Paul E. Fraser 



Canadian 



Full name of inventor 




Inventor's signature Date 
611 Windennere Avenue, Toronto, Ontario M6S 3L9 



Citizenship 



Residence 
Same 



Post Office Address 
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CERTIFICATE OF FACSIMILE TRANSMISSION: 
(703) 308-8724 



I hereby certify that this paper is being facsimile 
transmitted to the United States Patent and 
Trademark Office on the date shown below. 

Name of Person signing Certification 

Signature 

Date 1 ^ 



File No.: 1034/1F808-US2 



In re Application of: Peter H. ST. GEORGE-HYSLOP, et al. 

Serial No: 08/509,359 Examiner: P. Duffy 

Filed: July 31, 1995 Group Art Unit: 1 645 

For: GENETIC SEQUENCES AND PROTEINS RELATED TO ALZHEIMER'S DISEASE 



ASSOCIATE POWER OF ATTORNEY 



Hon. Commissioner of 
Patents and Trademarks 
Washington, DC 20231 

Sir: 

The undersigned attorney of record in the above-identified application hereby appoints: 

Michael J. Sweedler (No. 19,937) 
S. Peter Ludwig (No. 25,351) 
Paul Fields (No. 20,298) 
Joseph B. Lerch (No. 26,936) 
Melvin C. Garner (No. 26,272) 
Ethan Horwitz (No. 27,646) 
Beverly B. Goodwin (No. 28,417) 
Martin E. Goldstein (No. 20,869) 



Adda C. Gogoria (No. 29,714) 
Bert J. Lewen(No. 19,407) 
Henry Sternberg (No. 22,408) 
Peter C. Schechter (No. 31,662) 
Robert Schaffer (No. 31,194) 
Robert C. Sullivan, Jr. (No. 30,499) 
Ira J. Levy (No. 35,587) 
Joseph R. Robinson (No. 33,448) 
Paul F. Fehlner (No. 35,135) 
Phone: (212)527-7700 
Fax: (212)753-6237 



of Darby & Darby, P. C, located at 805 Third Avenue, New York, New York 10022 and 



Immac J. Thampoe (No. 36,322) 



of Schering-Plough Corporation, 2000 Galloping Hill Road, Kenilworth, New Jersey 
07033-0530, as associate attorney with full power to prosecute said application, and to transact 
all business in the United States Patent and Trademark Office in connection therewith. 

Please forward all communications to: 



Paul F. Fehlner, Ph.D. 
Darby & Darby P.C. 
805 Third Avenue 
New York, New York 10022 




Cynthia L. Foulke 
Reg. No. 32,364 



Dated: July f »P< 1999 
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